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CHAPTER 1 
ABOUT THIS MANUAL 


The Intel® 64 and IA-32 Architectures Software Developer's Manual, Volumes 2A, 2B, 2C & 2D: Instruction Set 
Reference (order numbers 253666, 253667, 326018 and 334569) are part of a set that describes the architecture 
and programming environment of all Intel 64 and IA-32 architecture processors. Other volumes in this set are: 

• The Intel® 64 and IA-32 Architectures Software Developer's Manual, Volume 1: Basic Architecture (Order 
Number 253665). 

• The Intel® 64 and IA-32 Architectures Software Developer's Manual, Volumes 3A, 3B, 3C & 3D: System 
Programming Guide (order numbers 253668, 253669, 326019 and 332831). 

The Intel® 64 and IA-32 Architectures Software Developer's Manual, Volume 1, describes the basic architecture 
and programming environment of Intel 64 and IA-32 processors. The Intel® 64 and IA-32 Architectures Software 
Developer's Manual, Volumes 2A, 2B, 2C & 2D, describe the instruction set of the processor and the opcode struc¬ 
ture. These volumes apply to application programmers and to programmers who write operating systems or exec¬ 
utives. The Intel® 64 and IA-32 Architectures Software Developer's Manual, Volumes 3A, 3B, 3C & 3D, describe 
the operating-system support environment of Intel 64 and IA-32 processors. These volumes target operating- 
system and BIOS designers. In addition, the Intel® 64 and IA-32 Architectures Software Developer's Manual, 
Volume 3B, addresses the programming environment for classes of software that host operating systems. 


1.1 INTEL® 64 AND IA-32 PROCESSORS COVERED IN THIS MANUAL 

This manual set includes information pertaining primarily to the most recent Intel 64 and IA-32 processors, which 
include: 

• Pentium® processors 

• P6 family processors 

• Pentium® 4 processors 

• Pentium® M processors 

• Intel® Xeon® processors 

• Pentium® D processors 

• Pentium® processor Extreme Editions 

• 64-bit Intel® Xeon® processors 

• Intel® Core™ Duo processor 

• Intel® Core™ Solo processor 

• Dual-Core Intel® Xeon® processor LV 

• Intel® Core™2 Duo processor 

• Intel® Core™2 Quad processor Q6000 series 

• Intel® Xeon® processor 3000, 3200 series 

• Intel® Xeon® processor 5000 series 

• Intel® Xeon® processor 5100, 5300 series 

• Intel® Core™2 Extreme processor X7000 and X6800 series 

• Intel® Core™2 Extreme processor QX6000 series 

• Intel® Xeon® processor 7100 series 

• Intel® Pentium® Dual-Core processor 

• Intel® Xeon® processor 7200, 7300 series 

• Intel® Xeon® processor 5200, 5400, 7400 series 
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• Intel® Core™2 Extreme processor QX9000 and X9000 series 

• Intel® Core™2 Quad processor Q9000 series 

• Intel® Core™2 Duo processor E8000, T9000 series 

• Intel® Atom™ processor family 

• Intel® Atom™ processors 200, 300, D400, D500, D2000, N200, N400, N2000, E2000, Z500, Z600, Z2000, 
CIOOO series are built from 45 nm and 32 nm processes 

• Intel® Core™ 17 processor 

• Intel® Core™ 15 processor 

• Intel® Xeon® processor E7-8800/4800/2800 product families 

• Intel® Core™ i7-3930K processor 

• 2nd generation Intel® Core™ i7-2xxx, Intel® Core™ i5-2xxx, Intel® Core™ i3-2xxx processor series 

• Intel® Xeon® processor E3-1200 product family 

• Intel® Xeon® processor E5-2400/1400 product family 

• Intel® Xeon® processor E5-4600/2600/1600 product family 

• 3rd generation Intel® Core™ processors 

• Intel® Xeon® processor E3-1200 v2 product family 

• Intel® Xeon® processor E5-2400/1400 v2 product families 

• Intel® Xeon® processor E5-4600/2600/1600 v2 product families 

• Intel® Xeon® processor E7-8800/4800/2800 v2 product families 

• 4th generation Intel® Core™ processors 

• The Intel® Core™ M processor family 

• Intel® Core™ i7-59xx Processor Extreme Edition 

• Intel® Core™ i7-49xx Processor Extreme Edition 

• Intel® Xeon® processor E3-1200 v3 product family 

• Intel® Xeon® processor E5-2600/1600 v3 product families 

• 5th generation Intel® Core™ processors 

• Intel® Xeon® processor D-1500 product family 

• Intel® Xeon® processor E5 v4 family 

• Intel® Atom™ processor X7-Z8000 and X5-Z8000 series 

• Intel® Atom™ processor Z3400 series 

• Intel® Atom™ processor Z3500 series 

• 6th generation Intel® Core™ processors 

• Intel® Xeon® processor E3-1500m v5 product family 

P6 family processors are IA-32 processors based on the P6 family microarchitecture. This includes the Pentium® 
Pro, Pentium® II, Pentium® III, and Pentium® III Xeon® processors. 

The Pentium® 4, Pentium® D, and Pentium® processor Extreme Editions are based on the Intel NetBurst® micro¬ 
architecture. Most early Intel® Xeon® processors are based on the Intel NetBurst® microarchitecture. Intel Xeon 
processor 5000, 7100 series are based on the Intel NetBurst® microarchitecture. 

The Intel® Core™ Duo, Intel® Core™ Solo and dual-core Intel® Xeon® processor LV are based on an improved 
Pentium® M processor microarchitecture. 

The Intel® Xeon® processor 3000, 3200, 5100, 5300, 7200, and 7300 series, Intel® Pentium® dual-core, Intel® 
Core™2 Duo, Intel® Core™2 Quad, and Intel® Core™2 Extreme processors are based on Intel® Core™ microarchi¬ 
tecture. 
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The Intel® Xeon® processor 5200, 5400, 7400 series, Intel® Core™2 Quad processor Q9000 series, and Intel® 
Core™2 Extreme processors QX9000, X9000 series, Intel® Core™2 processor E8000 series are based on Enhanced 
Intel® Core™ microarchitecture. 

The Intel® Atom™ processors 200, 300, D400, D500, D2000, N200, N400, N2000, E2000, Z500, Z600, Z2000, 
CIOOO series are based on the Intel® Atom™ microarchitecture and supports Intel 64 architecture. 

The Intel® Core™ i7 processor and Intel® Xeon® processor 3400, 5500, 7500 series are based on 45 nm Intel® 
microarchitecture code name Nehalem. Intel® microarchitecture code name Westmere is a 32 nm version of Intel® 
microarchitecture code name Nehalem. Intel® Xeon® processor 5600 series, Intel Xeon processor E7 and various 
Intel Core i7, i5, i3 processors are based on Intel® microarchitecture code name Westmere. These processors 
support Intel 64 architecture. 

The Intel® Xeon® processor E5 family, Intel® Xeon® processor E3-1200 family, Intel® Xeon® processor E7- 
8800/4800/2800 product families, Intel® Core™ i7-3930K processor, and 2nd generation Intel® Core™ i7-2xxx, 
Intel® Core™ i5-2xxx, Intel® Core™ i3-2xxx processor series are based on the Intel® microarchitecture code name 
Sandy Bridge and support Intel 64 architecture. 

The Intel® Xeon® processor E7-8800/4800/2800 v2 product families, Intel® Xeon® processor E3-1200 v2 product 
family and 3rd generation Intel® Core™ processors are based on the Intel® microarchitecture code name Ivy 
Bridge and support Intel 64 architecture. 

The Intel® Xeon® processor E5-4600/2600/1600 v2 product families, Intel® Xeon® processor E5-2400/1400 v2 
product families and Intel® Core™ i7-49xx Processor Extreme Edition are based on the Intel® microarchitecture 
code name Ivy Bridge-E and support Intel 64 architecture. 

The Intel® Xeon® processor E3-1200 v3 product family and 4th Generation Intel® Core™ processors are based on 
the Intel® microarchitecture code name Haswell and support Intel 64 architecture. 

The Intel® Core™ M processor family, 5th generation Intel® Core™ processors, Intel® Xeon® processor D-1500 
product family and the Intel® Xeon® processor E5 v4 family are based on the Intel® microarchitecture code name 
Broadwell and support Intel 64 architecture. 

The Intel® Xeon® processor E3-1500m v5 product family and 6th generation Intel® Core™ processors are based 
on the Intel® microarchitecture code name Skylake and support Intel 64 architecture. 

The Intel® Xeon® processor E5-2600/1600 v3 product families and the Intel® Core™ i7-59xx Processor Extreme 
Edition are based on the Intel® microarchitecture code name Haswell-E and support Intel 64 architecture. 

The Intel® Atom™ processor Z8000 series is based on the Intel microarchitecture code name Airmont. 

The Intel® Atom™ processor Z3400 series and the Intel® Atom™ processor Z3500 series are based on the Intel 
microarchitecture code name Silvermont. 

P6 family, Pentium® M, Intel® Core™ Solo, Intel® Core™ Duo processors, dual-core Intel® Xeon® processor LV, 
and early generations of Pentium 4 and Intel Xeon processors support IA-32 architecture. The Intel® Atom™ 
processor Z5xx series support IA-32 architecture. 

The Intel® Xeon® processor 3000, 3200, 5000, 5100, 5200, 5300, 5400, 7100, 7200, 7300, 7400 series, Intel® 
Core™2 Duo, Intel® Core™2 Extreme, Intel® Core™2 Quad processors, Pentium® D processors, Pentium® Dual- 
Core processor, newer generations of Pentium 4 and Intel Xeon processor family support Intel® 64 architecture. 

IA-32 architecture is the instruction set architecture and programming environment for Intel's 32-bit microproces¬ 
sors. Intel® 64 architecture is the instruction set architecture and programming environment which is the superset 
of Intel's 32-bit and 64-bit architectures. It is compatible with the IA-32 architecture. 


1.2 OVERVIEW OF VOLUME 2A, 2B, 2C AND 2D: INSTRUCTION SET REFERENCE 

A description of I ntel® 64 and IA-32 Architectures Software Developer's Manual, Volumes 2A, 2B, 2C & 2D content 
follows: 

Chapter 1 — About This Manual. Gives an overview of all seven volumes of the Intel® 64 and IA-32 Architec¬ 
tures Software Developer's Manual. It also describes the notational conventions in these manuals and lists related 
Intel® manuals and documentation of interest to programmers and hardware designers. 
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Chapter 2 — I nstruction Format. Describes the machine-level instruction format used for all IA-32 instructions 
and gives the allowable encodings of prefixes, the operand-identifier byte (ModR/M byte), the addressing-mode 
specifier byte (SIB byte), and the displacement and immediate bytes. 

Chapter 3 — Instruction Set Reference, A-L. Describes Intel 64 and IA-32 instructions in detail, including an 
algorithmic description of operations, the effect on flags, the effect of operand- and address-size attributes, and 
the exceptions that may be generated. The instructions are arranged in alphabetical order. General-purpose, x87 
FPU, Intel MMX™ technology, SSE/SSE2/SSE3/SSSE3/SSE4 extensions, and system instructions are included. 

Chapter 4 — Instruction Set Reference, M-U. Continues the description of Intel 64 and IA-32 instructions 
started in Chapter 3. It starts Intel® 64 and IA-32 Architectures Software Developer's Manual, Volume 2B. 

Chapter 5 — Instruction Set Reference, V-Z. Continues the description of Intel 64 and IA-32 instructions 
started in chapters 3 and 4. It provides the balance of the alphabetized list of instructions and starts I ntel® 64 and 
IA-32 Architectures Software Developer's Manual, Volume 2C. 

Chapter 6— Safer Mode Extensions Reference. Describes the safer mode extensions (SMX). SMX is intended 
for a system executive to support launching a measured environment in a platform where the identity of the soft¬ 
ware controlling the platform hardware can be measured for the purpose of making trust decisions. This chapter 
starts Intel® 64 and IA-32 Architectures Software Developer's Manual, Volume 2D. 

Appendix A — Opcode Map. Gives an opcode map for the IA-32 instruction set. 

Appendix B — I nstruction Formats and Encodings. Gives the binary encoding of each form of each IA-32 
instruction. 

Appendix C — I ntei® C/ C+-h Compiier I ntrinsics and Functionai Equivaients. Lists the Intel® C/C-H+ compiler 
intrinsics and their assembly code equivalents for each of the IA-32 MMX and SSE/SSE2/SSE3 instructions. 


1.3 NOTATIONAL CONVENTIONS 

This manual uses specific notation for data-structure formats, for symbolic representation of instructions, and for 
hexadecimal and binary numbers. A review of this notation makes the manual easier to read. 


1.3.1 Bit and Byte Order 

In illustrations of data structures in memory, smaller addresses appear toward the bottom of the figure; addresses 
increase toward the top. Bit positions are numbered from right to left. The numerical value of a set bit is equal to 
two raised to the power of the bit position. IA-32 processors are "little endian" machines; this means the bytes of 
a word are numbered starting from the least significant byte. Figure 1-1 illustrates these conventions. 


Highest Data Structure 

Address 31 24 23 16 15 8 7 0 Bit offset 

28 
24 
20 
16 
12 
8 
4 

„ Lowest 

^ Address 

Byte Offset 


Figure 1 -1. Bit and Byte Order 
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Byte 0 
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1.3.2 Reserved Bits and Software Compatibility 

In many register and memory layout descriptions, certain bits are marked as reserved. When bits are marked as 
reserved, it is essential for compatibility with future processors that software treat these bits as having a future, 
though unknown, effect. The behavior of reserved bits should be regarded as not only undefined, but unpredict¬ 
able. Software should follow these guidelines in dealing with reserved bits: 

• Do not depend on the states of any reserved bits when testing the values of registers which contain such bits. 
Mask out the reserved bits before testing. 

• Do not depend on the states of any reserved bits when storing to memory or to a register. 

• Do not depend on the ability to retain information written into any reserved bits. 

• When loading a register, always load the reserved bits with the values indicated in the documentation, if any, 
or reload them with values previously read from the same register. 

NOTE 

Avoid any software dependence upon the state of reserved bits in IA-32 registers. Depending upon 
the values of reserved register bits will make software dependent upon the unspecified manner in 
which the processor handles these bits. Programs that depend upon reserved values risk incompat¬ 
ibility with future processors. 


1.3.3 Instruction Operands 

When instructions are represented symbolically, a subset of the IA-32 assembly language is used. In this subset, 
an instruction has the following format: 

label: mnemonic argumentl, argumentZ, arguments 
where: 

• A label is an identifier which is followed by a colon. 

• A mnemonic is a reserved name for a class of instruction opcodes which have the same function. 

• The operands argumentl, argument2, and arguments are optional. There may be from zero to three operands, 
depending on the opcode. When present, they take the form of either literals or identifiers for data items. 
Operand identifiers are either reserved names of registers or are assumed to be assigned to data items 
declared in another part of the program (which may not be shown in the example). 

When two operands are present in an arithmetic or logical instruction, the right operand is the source and the left 
operand is the destination. 

For example: 

LOADREG: MOV EAX, SUBTOTAL 

In this example, LOADREG is a label, MOV is the mnemonic identifier of an opcode, EAX is the destination operand, 
and SUBTOTAL is the source operand. Some assembly languages put the source and destination in reverse order. 


1.3.4 Hexadecimal and Binary Numbers 

Base 16 (hexadecimal) numbers are represented by a string of hexadecimal digits followed by the character H (for 
example, F82EH). A hexadecimal digit is a character from the following set: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, 
E, and F. 

Base 2 (binary) numbers are represented by a string of Is and Os, sometimes followed by the character B (for 
example, lOlOB). The "B" designation is only used in situations where confusion as to the type of number might 
arise. 
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1.3.5 Segmented Addressing 

The processor uses byte addressing. This means memory is organized and accessed as a sequence of bytes. 
Whether one or more bytes are being accessed, a byte address is used to locate the byte or bytes in memory. The 
range of memory that can be addressed is called an address space. 

The processor also supports segmented addressing. This is a form of addressing where a program may have many 
independent address spaces, called segments. For example, a program can keep its code (instructions) and stack 
in separate segments. Code addresses would always refer to the code space, and stack addresses would always 
refer to the stack space. The following notation is used to specify a byte address within a segment: 

Segment-registenByte-address 

For example, the following segment address identifies the byte at address FF79H in the segment pointed by the DS 
register: 

DS:FF79H 

The following segment address identifies an instruction address in the code segment. The CS register points to the 
code segment and the EIP register contains the address of the instruction. 

CS:EIP 


1.3.6 Exceptions 

An exception is an event that typically occurs when an instruction causes an error. For example, an attempt to 
divide by zero generates an exception. However, some exceptions, such as breakpoints, occur under other condi¬ 
tions. Some types of exceptions may provide error codes. An error code reports additional information about the 
error. An example of the notation used to show an exception and error code is shown below: 

#PF(fault code) 

This example refers to a page-fault exception under conditions where an error code naming a type of fault is 
reported. Under some conditions, exceptions which produce error codes may not be able to report an accurate 
code. In this case, the error code is zero, as shown below for a general-protection exception: 

#GP(0) 


1.3.7 A New Syntax for CPUID, CR, and MSR Values 

Obtain feature flags, status, and system information by using the CPUID instruction, by checking control register 
bits, and by reading model-specific registers. We are moving toward a new syntax to represent this information. 
See Figure 1-2. 
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CPUID Input and Output 


CPUID.01H:ECX.SSE[bit 25] = 1 


Input value for EAX register 


Output register and feature flag or field 
name with bit position{s) 


Control Register Values 


Value (or range) of output 


CR4.0SFXSR[bit 9] = 1 


T 


Example CR name 

T 

Feature flag or field name 
with bit position{s) 

Value (or range) of output 

Model-Specific Register Values 

IA32_MISC_ENABLE.ENABLEFOPCODE[bit 2] = 1 


Example MSR name 


Feature flag or field name with bit position(s) 

Value (or range) of output 


Figure 1 -Z. Syntax for CPUID, CR, and MSR Data Presentation 


1.4 RELATED LITERATURE 

Literature related to Intel 64 and IA-32 processors is listed and viewable on-line at: 
http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html 
See also: 

• The data sheet for a particular Intel 64 or IA-32 processor 

• The specification update for a particular Intel 64 or IA-32 processor 

• Intel® C-F-F Compiler documentation and online help: 
http://software.intel.com/en-us/articles/intel-compilers/ 

• Intel® Fortran Compiler documentation and online help: 
http://software.intel.com/en-us/articles/intel-compilers/ 
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• Intel® Software Development Tools: 
http://www.intel.com/cd/software/products/asmo-na/eng/index.htm 

• Intel® 64 and IA-32 Architectures Software Developer's Manual (in three or seven volumes): 
http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html 

• Intel® 64 and IA-32 Architectures Optimization Reference Manual: 
http://www.intel.com/content/www/us/en/architecture-and-technology/64-ia-32-architectures-optimization- 
manual.html 

• Intel 64 Architecture x2APIC Specification: 

http://www.intel.com/content/www/us/en/architecture-and-technology/64-architecture-x2apic-specifi- 

cation.html 

• Intel® Trusted Execution Technology Measured Launched Environment Programming Guide: 
http://www.intel.com/content/www/us/en/software-developers/intel-txt-software-development-guide.html 

• Developing Multi-threaded Applications: A Platform Consistent Approach: 
https://software.intel.com/sites/default/files/article/147714/51534-developing-multithreaded-applica- 
tions.pdf 

• Using Spin-Loops on Intel® Pentium® 4 Processor and Intel® Xeon® Processor: 
http://software.intel.com/en-us/articles/ap949-using-spin-loops-on-intel-pentiumr-4-processor-and-intel- 
xeonr-processor/ 

• Performance Monitoring Unit Sharing Guide 
http://software.intel.com/file/30388 

Literature related to selected features in future Intel processors are available at: 

• Intel® Architecture Instruction Set Extensions Programming Reference 
https://software.intel.com/en-us/isa-extensions 

• Intel® Software Guard Extensions (Intel® SGX) Programming Reference 
https://software.intel.com/en-us/isa-extensions/intel-sgx 

More relevant links are: 

• Intel® Developer Zone: 
https://software.intel.com/en-us 

• Developer centers: 

http://www.intel.com/content/www/us/en/hardware-developers/developer-centers.html 

• Processor support general link: 
http://www.intel.com/support/processors/ 

• Software products and packages: 
http://www.intel.com/cd/software/products/asmo-na/eng/index.htm 

• Intel® Hyper-Threading Technology (Intel® HT Technology): 
http://www.intel.com/technology/platform-technology/hyper-threading/index.htm 
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CHAPTER 2 
INSTRUCTION FORMAT 


This chapter describes the instruction format for aii Intei 64 and IA-32 processors. The instruction format for 
protected mode, reai-address mode and virtuai-8086 mode is described in Section 2.1. Increments provided for 
IA-32e mode and its sub-modes are described in Section 2.2. 


2.1 INSTRUCTION FORMAT FOR PROTECTED MODE, REAL-ADDRESS MODE, 
AND VIRTUAL-8086 MODE 

The Intei 64 and IA-32 architectures instruction encodings are subsets of the format shown in Figure 2-1. Instruc¬ 
tions consist of optionai instruction prefixes (in any order), primary opcode bytes (up to three bytes), an 
addressing-form specifier (if required) consisting of the ModR/M byte and sometimes the SIB (Scaie-Index-Base) 
byte, a dispiacement (if required), and an immediate data fieid (if required). 


Instruction 

Prefixes 

Opcode 

ModR/M 

SIB 

Displacement 

Immediate 

Prefixes of 

1 byte each 
(optionai)^' ^ 

1-, 2-, or 3-byte 
opcode 

1 byte 
(if required) 

/ 

1 byte 
if required) 

\ 

Address 
displacement 
of 1,2, or4 
bytes or none^ 

mmediate 
data of 

1,2, or4 
bytes or none^ 


> 

7 6 5 

3 2 0 

7 6 5 

3 2 0 



Mod 

Reg/ 

Opcode 

R/M 


Scale 

Index 

Base 


1. The REX prefix is optionai, but if used must be immediateiy before the opcode; see Section 
2.2.1, "REX Prefixes” for additionai information. 

2. For VEX encoding information, see Section 2.3, “Intei® Advanced Vector Extensions (Intei® 
AVX)”. 

3. Some rare instructions can take an 8B immediate or 8B dispiacement. 


Figure 2-1. Intel 64 and IA-32 Architectures Instruction Format 


2.1.1 Instruction Prefixes 

Instruction prefixes are divided into four groups, each with a set of allowable prefix codes. For each instruction, it 
is only useful to include up to one prefix code from each of the four groups (Groups 1, 2, 3, 4). Groups 1 through 4 
may be placed in any order relative to each other. 

• Group 1 

— Lock and repeat prefixes: 

• LOCK prefix is encoded using FOFI. 

• REPNE/REPNZ prefix is encoded using F2FI. Repeat-Not-Zero prefix applies only to string and 
input/output instructions. (F2FI is also used as a mandatory prefix for some instructions.) 

• REP or REPE/REPZ is encoded using F3FI. The repeat prefix applies only to string and input/output 
instructions. F3FI is also used as a mandatory prefix for POPCNT, LZCNT and ADOX instructions. 

— Bound prefix is encoded using F2FI if the following conditions are true: 

• CPUID.(EAX=07H, ECX=0):EBX.MPX[bit 14] is set. 


Vol. 2A 2-1 

























INSTRUCTION FORMAT 


• BNDCFGU.EN and/or IA32_BNDCFGS.EN is set. 

• When the F2 prefix precedes a near CALL, a near RET, a near JMP, or a near Jcc instruction (see Chapter 
17, "Intel® MPX," of the Intel® 64 and IA-32 Architectures Software Developer's Manual, Volume 1). 

• Group 2 

— Segment override prefixes: 

• 2EH—CS segment override (use with any branch instruction is reserved). 

• 36H—SS segment override prefix (use with any branch instruction is reserved). 

• 3EH—DS segment override prefix (use with any branch instruction is reserved). 

• 26H—ES segment override prefix (use with any branch instruction is reserved). 

• 64H—FS segment override prefix (use with any branch instruction is reserved). 

• 65H—GS segment override prefix (use with any branch instruction is reserved). 

— Branch hints^: 

• 2EH—Branch not taken (used only with Jcc instructions). 

• 3EH—Branch taken (used only with Jcc instructions). 

• Group 3 

• Operand-size override prefix is encoded using 66H (66H is also used as a mandatory prefix for some 
instructions). 

• Group 4 

• 67H—Address-size override prefix. 

The LOCK prefix (FOH) forces an operation that ensures exclusive use of shared memory in a multiprocessor envi¬ 
ronment. See "LOCK—Assert LOCK# Signal Prefix" in Chapter 3, "Instruction Set Reference, A-L," for a description 
of this prefix. 

Repeat prefixes (F2H, F3H) cause an instruction to be repeated for each element of a string. Use these prefixes 
only with string and I/O instructions (MOVS, CMPS, SCAS, LODS, STOS, INS, and OUTS). Use of repeat prefixes 
and/or undefined opcodes with other Intel 64 or IA-32 instructions is reserved; such use may cause unpredictable 
behavior. 

Some instructions may use F2H,F3H as a mandatory prefix to express distinct functionality. 

Branch hint prefixes (2EH, 3EH) allow a program to give a hint to the processor about the most likely code path for 
a branch. Use these prefixes only with conditional branch instructions (Jcc). Other use of branch hint prefixes 
and/or other undefined opcodes with Intel 64 or IA-32 instructions is reserved; such use may cause unpredictable 
behavior. 

The operand-size override prefix allows a program to switch between 16- and 32-bit operand sizes. Either size can 
be the default; use of the prefix selects the non-default size. 

Some SSE2/SSE3/SSSE3/SSE4 instructions and instructions using a three-byte sequence of primary opcode bytes 
may use 66H as a mandatory prefix to express distinct functionality. 

Other use of the 66H prefix is reserved; such use may cause unpredictable behavior. 

The address-size override prefix (67H) allows programs to switch between 16- and 32-bit addressing. Either size 
can be the default; the prefix selects the non-default size. Using this prefix and/or other undefined opcodes when 
operands for the instruction do not reside in memory is reserved; such use may cause unpredictable behavior. 


1. Some earlier microarchitectures used these as branch hints, but recent generations have not and they are reserved for future hint 
usage. 
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2.1.2 Opcodes 

A primary opcode can be 1, 2, or 3 bytes in length. An additional 3-bit opcode field is sometimes encoded in the 
ModR/M byte. Smaller fields can be defined within the primary opcode. Such fields define the direction of opera¬ 
tion, size of displacements, register encoding, condition codes, or sign extension. Encoding fields used by an 
opcode vary depending on the class of operation. 

Two-byte opcode formats for general-purpose and SIMD instructions consist of one of the following: 

• An escape opcode byte OFH as the primary opcode and a second opcode byte. 

• A mandatory prefix (66H, F2H, or F3H), an escape opcode byte, and a second opcode byte (same as previous 
bullet). 

For example, CVTDQ2PD consists of the following sequence: F3 OF E6. The first byte is a mandatory prefix (it is not 
considered as a repeat prefix). 

Three-byte opcode formats for general-purpose and SIMD instructions consist of one of the following: 

• An escape opcode byte OFH as the primary opcode, plus two additional opcode bytes. 

• A mandatory prefix (66H, F2H, or F3H), an escape opcode byte, plus two additional opcode bytes (same as 
previous bullet). 

For example, PHADDW for XMM registers consists of the following sequence: 66 OF 38 01. The first byte is the 
mandatory prefix. 

Valid opcode expressions are defined in Appendix A and Appendix B. 


2.1.3 ModR/M and SIB Bytes 

Many instructions that refer to an operand in memory have an addressing-form specifier byte (called the ModR/M 

byte) following the primary opcode. The ModR/M byte contains three fields of information: 

• The mod field combines with the r/m field to form 32 possible values: eight registers and 24 addressing modes. 

• The reg/opcode field specifies either a register number or three more bits of opcode information. The purpose 
of the reg/opcode field is specified in the primary opcode. 

• The r/m field can specify a register as an operand or it can be combined with the mod field to encode an 
addressing mode. Sometimes, certain combinations of the mod field and the r/m field are used to express 
opcode information for some instructions. 

Certain encodings of the ModR/M byte require a second addressing byte (the SIB byte). The base-plus-index and 

scale-plus-index forms of 32-bit addressing require the SIB byte. The SIB byte includes the following fields: 

• The scale field specifies the scale factor. 

• The Index field specifies the register number of the index register. 

• The base field specifies the register number of the base register. 

See Section 2.1.5 for the encodings of the ModR/M and SIB bytes. 


2.1.4 Displacement and Immediate Bytes 

Some addressing forms include a displacement immediately following the ModR/M byte (or the SIB byte if one is 
present). If a displacement is required, it can be 1, 2, or 4 bytes. 

If an instruction specifies an immediate operand, the operand always follows any displacement bytes. An imme¬ 
diate operand can be 1, 2 or 4 bytes. 
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2.1.5 Addressing-Mode Encoding of ModR/M and SIB Bytes 

The values and corresponding addressing forms of the ModR/M and SIB bytes are shown in Table 2-1 through Table 
2-3: 16-bit addressing forms specified by the ModR/M byte are in Table 2-1 and 32-bit addressing forms are in 
Table 2-2. Table 2-3 shows 32-bit addressing forms specified by the SIB byte. In cases where the reg/opcode field 
in the ModR/M byte represents an extended opcode, valid encodings are shown in Appendix B. 

In Table 2-1 and Table 2-2, the Effective Address column lists 32 effective addresses that can be assigned to the 
first operand of an instruction by using the Mod and R/M fields of the ModR/M byte. The first 24 options provide 
ways of specifying a memory location; the last eight (Mod = IIB) provide ways of specifying general-purpose, MMX 
technology and XMM registers. 

The Mod and R/M columns in Table 2-1 and Table 2-2 give the binary encodings of the Mod and R/M fields required 
to obtain the effective address listed in the first column. For example: see the row indicated by Mod = IIB, R/M = 
OOOB. The row identifies the general-purpose registers EAX, AX or AL; MMX technology register MMO; or XMM 
register XMMO. The register used is determined by the opcode byte and the operand-size attribute. 

Now look at the seventh row in either table (labeled "REG ="). This row specifies the use of the 3-bit Reg/Opcode 
field when the field is used to give the location of a second operand. The second operand must be a general- 
purpose, MMX technology, or XMM register. Rows one through five list the registers that may correspond to the 
value in the table. Again, the register used is determined by the opcode byte along with the operand-size attribute. 

If the instruction does not require a second operand, then the Reg/Opcode field may be used as an opcode exten¬ 
sion. This use is represented by the sixth row in the tables (labeled "/digit (Opcode)"). Note that values in row six 
are represented in decimal form. 

The body of Table 2-1 and Table 2-2 (under the label "Value of ModR/M Byte (in Hexadecimal)") contains a 32 by 
8 array that presents all of 256 values of the ModR/M byte (in hexadecimal). Bits 3, 4 and 5 are specified by the 
column of the table in which a byte resides. The row specifies bits 0, 1 and 2; and bits 6 and 7. The figure below 
demonstrates interpretation of one table value. 


Mod 

11 

RM 

000 

/digit (Opcode); REG = 

001 

C8H 

11001000 


Figure 2-2. Table Interpretation of ModR/M Byte (C8H) 
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Table 2-1. 16-Bit Addressing Forms with the ModR/M Byte 


r8(/r) 
r16(/r) 
r32(/r) 
mm(/r) 
xmm(/r) 

(In decimal) /digit (Opcode) 

(In binary) REG = 

AL 

AX 

EAX 

MMO 

XMMO 

0 

000 

CL 

CX 

ECX 

MM1 

XMM1 

1 

001 

DL 

DX 

EDX 

MM2 

XMM2 

2 

010 

BL 

BX 

EBX 

MM3 

XMM3 

3 

Oil 

AH 

SP 

ESP 

MM4 

XMM4 

4 

100 

BP^ 

EBP 

MM5 

XMM5 

5 

101 

DH 

SI 

ESI 

MM6 

XMM6 

6 

110 

BH 

Dl 

EDI 

MM7 

XMM7 

7 

111 

Effective Address 

Mod 

R/M 

Value of ModR/M Byte (in Hexadecimal) 


[BX-rSI] 

00 

000 

00 

08 

10 

18 

20 

28 

30 

38 


BX-hDI] 


001 

01 

09 

11 

19 

21 

29 

31 

39 


BP-rSr 



010 

02 

OA 

12 

1A 

22 

2A 

32 

3A 


BP-rDll 


oil 

03 

08 

13 

18 

23 

28 

33 

38 


SI] 



100 

04 

OC 

14 

1C 

24 

2C 

34 

3C 


Dll 



101 

05 

OD 

15 

ID 

25 

2D 

35 

3D 

dispi 6^ 


110 

06 

OE 

16 

IE 

26 

2E 

36 

3E 

[BX] 



111 

07 

OF 

17 

IF 

27 

2F 

37 

3F 


[BX-rSI] 

-rdispB^ 

01 

000 

40 

48 

50 

58 

60 

68 

70 

78 


BX-hDI 

-rdispB 


001 

41 

49 

51 

59 

61 

69 

71 

79 


BP-hSI 

-rdispB 


010 

42 

4A 

52 

5A 

62 

6A 

72 

7A 


BP-hDI 

-i-dispB 


oil 

43 

48 

53 

58 

63 

68 

73 

78 


SIl-rdispB 


100 

44 

4C 

54 

5C 

64 

6C 

74 

7C 


Dlj-i-dispB 


101 

45 

4D 

55 

5D 

65 

6D 

75 

7D 


BP]-Hdisp8 


110 

46 

4E 

56 

5E 

66 

6E 

76 

7E 


[BXJ-i-dispB 


111 

47 

4F 

57 

5F 

67 

6F 

77 

7F 


[BX-HSI]-Hdisp16 

10 

000 

80 

88 

90 

98 

AO 

A8 

80 

88 


BX-rDll-rdispIG 


001 

81 

89 

91 

99 

A1 

A9 

81 

89 


BP-HSI]-rdisp16 


010 

82 

8A 

92 

9A 

A2 

AA 

82 

BA 


BP-HDI]-rdisp16 


oil 

83 

88 

93 

98 

A3 

AB 

83 

88 


Sl]-i-disp16 


100 

84 

8C 

94 

9C 

A4 

AC 

84 

BC 


Dl]-i-disp16 


101 

85 

8D 

95 

9D 

A5 

AD 

85 

BD 


BP]-Hdisp16 


110 

86 

8E 

96 

9E 

A6 

AE 

86 

BE 


[BX]-i-disp16 


111 

87 

8F 

97 

9F 

A7 

AF 

87 

BF 

EAX/AX/AL/MMO/XMMO 

11 

000 

CO 

C8 

DO 

D8 

EO 

E8 

FO 

F8 

ECX/CX/CL/MM1/XMM1 


001 

Cl 

C9 

D1 

D9 

EQ 

E9 

FI 

F9 

EDX/DX/DL/MM2/XMM2 


010 

C2 

CA 

D2 

DA 

E2 

EA 

F2 

FA 

EBX/BX/BL/MM3/XMM3 


oil 

C3 

CB 

D3 

DB 

E3 

EB 

F3 

FB 

ESP/SP/AHMM4/XMM4 


100 

C4 

CC 

D4 

DC 

E4 

EC 

F4 

FC 

EBP/BP/CH/MM5/XMM5 


101 

C5 

CD 

D5 

DD 

E5 

ED 

F5 

FD 

ESI/SI/DH/MM6/XMM6 


110 

C6 

CE 

D6 

DE 

E6 

EE 

F6 

FE 

EDI/DI/BH/MM7/XMM7 


111 

C7 

CF 

D7 

DF 

E7 

EF 

F7 

FF 


NOTES: 

1. The default segment register is SS for the effective addresses containing a BP index, DS for other effective addresses. 

2. The dispi 6 nomenclature denotes a 16-bit displacement that follows the ModR/M byte and that is added to the index. 

3. The dispB nomenclature denotes an 8-bit displacement that follows the ModR/M byte and that is sign-extended and added to the 
index. 


Vol. 2A 2-5 





























INSTRUCTION FORMAT 


Table 2-2. 32-Bit Addressing Forms with the ModR/M Byte 


r8(/r) 
r16(/r) 
r32(/r) 
mm(/r) 
xmm(/r) 

(In decimal) /digit (Opcode) 

(In binary) REG = 

AL 

AX 

EAX 

MMO 

XMMO 

0 

000 

CL 

CX 

ECX 

MM1 

XMM1 

1 

001 

DL 

DX 

EDX 

MM2 

XMM2 

2 

010 

BL 

BX 

EBX 

MM3 

XMM3 

3 

Oil 

AH 

SP 

ESP 

MM4 

XMM4 

4 

100 

CH 

BP 

EBP 

MM5 

XMM5 

5 

101 

DH 

SI 

ESI 

MM6 

XMM6 

6 

110 

BH 

Dl 

EDI 

MM7 

XMM7 

7 

111 

Effective Address 

Mod 

R/M 

Value of ModR/M Byte (in Hexadecimal) 


[EAX] 

00 

000 

00 

08 

10 

18 

20 

28 

30 

38 


ECX] 


001 

01 

09 

11 

19 

21 

29 

31 

39 


EDX] 


010 

02 

OA 

12 

1A 

22 

2A 

32 

3A 


EBX]^ 


oil 

03 

08 

13 

18 

23 

28 

33 

38 




100 

04 

OC 

14 

1C 

24 

2C 

34 

3C 

dlsp32'^ 


101 

05 

OD 

15 

ID 

25 

2D 

35 

3D 

[ESI] 


110 

06 

OE 

16 

IE 

26 

2E 

36 

3E 

[EDI] 


111 

07 

OF 

17 

IF 

27 

2F 

37 

3F 


[EAX]-Hdisp83 

01 

000 

40 

48 

50 

58 

60 

68 

70 

78 


ECX]-rdisp8 


001 

41 

49 

51 

59 

61 

69 

71 

79 


EDX]-Hdisp8 


010 

42 

4A 

52 

5A 

62 

6A 

72 

7A 


EBX]-Hdisp8 


oil 

43 

48 

53 

58 

63 

68 

73 

78 


-][-]+disp8 


100 

44 

4C 

54 

5C 

64 

6C 

74 

7C 


EBP]-i-disp8 


101 

45 

4D 

55 

5D 

65 

6D 

75 

7D 


ESI]-Hdisp8 


110 

46 

4E 

56 

5E 

66 

6E 

76 

7E 


[EDI]-Hdisp8 


111 

47 

4F 

57 

5F 

67 

6F 

77 

7F 


[EAX]-Hdisp32 

10 

000 

80 

88 

90 

98 

AO 

A8 

80 

88 


ECX]-rdisp32 


001 

81 

89 

91 

99 

A1 

A9 

81 

89 


EDX]-Hdisp32 


010 

82 

8A 

92 

9A 

A2 

AA 

82 

BA 


EBX]-Hdisp32 


oil 

83 

88 

93 

98 

A3 

AB 

83 

88 


-]H+disp32 


100 

84 

8C 

94 

9C 

A4 

AC 

84 

BC 


EBP]-i-disp32 


101 

85 

8D 

95 

9D 

A5 

AD 

85 

BD 


ESI]-Hdisp32 


110 

86 

8E 

96 

9E 

A6 

AE 

86 

BE 


[EDI]-Hdisp32 


111 

87 

8F 

97 

9F 

A7 

AF 

87 

BF 

EAX/AX/AL/MMO/XMMO 

11 

000 

CO 

C8 

DO 

D8 

EO 

E8 

FO 

F8 

ECX/CX/CL/MM/XMM1 


001 

Cl 

C9 

D1 

D9 

El 

E9 

FI 

F9 

EDX/DX/DL/MM2/XMM2 


010 

C2 

CA 

D2 

DA 

E2 

EA 

F2 

FA 

EBX/BX/BL/MM3/XMM3 


oil 

C3 

CB 

D3 

DB 

E3 

EB 

F3 

FB 

ESP/SP/AH/MM4/XMM4 


100 

C4 

CC 

D4 

DC 

E4 

EC 

F4 

FC 

EBP/BP/CH/MM5/XMM5 


101 

C5 

CD 

D5 

DD 

E5 

ED 

F5 

FD 

ESI/SI/DH/MM6/XMM6 


110 

C6 

CE 

D6 

DE 

E6 

EE 

F6 

FE 

EDI/DI/BH/MM7/XMM7 


111 

C7 

CF 

D7 

DF 

E7 

EF 

F7 

FF 


NOTES: 

1. The [--][--] nomenclature means a SIB follows the ModR/M byte. 

2. The dlsp32 nomenclature denotes a 32-bit displacement that follows the ModR/M byte (or the SIB byte if one is present) and that is 
added to the index. 

3. The dispB nomenclature denotes an 8-bit displacement that follows the ModR/M byte (or the SIB byte if one is present) and that is 
sign-extended and added to the index. 


Table 2-3 is organized to give 256 possible values of the SIB byte (in hexadecimal). General purpose registers used 
as a base are indicated across the top of the table, along with corresponding values for the SIB byte's base field. 
Table rows in the body of the table indicate the register used as the index (SIB byte bits 3, 4 and 5) and the scaling 
factor (determined by SIB byte bits 6 and 7). 
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Table 2-3. 32-Bit Addressing Forms with the S 


B Byte 


r32 

(In decimal) Base = 

(In binary) Base = 

EAX 

0 

000 

ECX 

1 

001 

EDX 

2 

010 

EBX 

3 

Oil 

ESP 

4 

100 

[*] 

5 

101 

ESI 

6 

110 

EDI 

7 

111 

Scaled Index 

SS 

Index 

Value of SIB Byte (in Hexadecimal) 


[EAX 



00 

000 

00 

01 

02 

03 

04 

05 

06 

07 


ECX 




001 

08 

09 

OA 

08 

OC 

OD 

OE 

OF 


EDX 




010 

10 

11 

12 

13 

14 

15 

16 

17 


EBX 




oil 

18 

19 

1A 

18 

1C 

ID 

IE 

IF 

none 




100 

20 

21 

22 

23 

24 

25 

26 

27 


EBP] 



101 

28 

29 

2A 

28 

2C 

2D 

2E 

2F 


ESI] 




110 

30 

31 

32 

33 

34 

35 

36 

37 


[EDI] 




111 

38 

39 

3A 

38 

3C 

3D 

3E 

3F 


[EAX*2 


01 

000 

40 

41 

42 

43 

44 

45 

46 

47 


ECX*2 



001 

48 

49 

4A 

48 

4C 

4D 

4E 

4F 


EDX*2 



010 

50 

51 

52 

53 

54 

55 

56 

57 


EBX*2 



oil 

58 

59 

5A 

58 

5C 

5D 

5E 

5F 

none 




100 

60 

61 

62 

63 

64 

65 

66 

67 


EBP*2] 


101 

68 

69 

6A 

68 

6C 

6D 

6E 

6F 


ESI*2] 



110 

70 

71 

72 

73 

74 

75 

76 

77 


[EDI*2] 



111 

78 

79 

7A 

78 

7C 

7D 

7E 

7F 


[EAX*4 


10 

000 

80 

81 

82 

83 

84 

85 

86 

87 


ECX*4 



001 

88 

89 

8A 

88 

8C 

8D 

8E 

8F 


EDX*4 



010 

90 

91 

92 

93 

94 

95 

96 

97 


EBX*4' 



oil 

98 

99 

9A 

98 

9C 

9D 

9E 

9F 

none 




100 

AO 

A1 

A2 

A3 

A4 

A5 

A6 

A7 


EBP*4] 


101 

A8 

A9 

AA 

AB 

AC 

AD 

AE 

AF 


ESI*4] 



110 

80 

B1 

82 

83 

84 

85 

86 

87 


[EDI*4] 



111 

88 

89 

BA 

88 

BC 

BD 

BE 

BF 


rEAX*8] 

11 

000 

CO 

Cl 

C2 

C3 

C4 

C5 

C6 

C7 


ECX*8' 



001 

C8 

C9 

CA 

CB 

CC 

CD 

CE 

CF 


EDX*8] 


010 

DO 

D1 

D2 

D3 

D4 

D5 

D6 

D7 


EBX*8] 


oil 

D8 

D9 

DA 

DB 

DC 

DD 

DE 

DF 

none 




100 

EO 

El 

E2 

E3 

E4 

E5 

E6 

E7 


EBP*8' 



101 

E8 

E9 

EA 

EB 

EC 

ED 

EE 

EF 


ESI*8] 



110 

FO 

FI 

F2 

F3 

F4 

F5 

F6 

F7 


[EDI*8] 



111 

F8 

F9 

FA 

FB 

FC 

FD 

FE 

FF 


NOTES: 

1. The [*] nomenclature means a dlsp32 with no base if the MOD is OOB. Otherwise, [*] means dispB or disp32 + [EBP], This provides the 
following address modes: 


MOD bits 

Effective Address 

00 

[scaled index] + disp32 

01 

[scaled index] + disp8 + [EBP] 

10 

[scaled index] + disp32 + [EBP] 

2.2 

IA-32E MODE 


IA-32e mode has two sub-modes. These are: 

• Compatibility Mode. Enables a 64-bit operating system to run most legacy protected mode software 
unmodified. 

• 64-Bit Mode. Enables a 64-bit operating system to run applications written to access 64-bit address space. 
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2.2.1 REX Prefixes 

REX prefixes are instruction-prefix bytes used in 64-bit mode. They do the following: 

• Specify GPRs and SSE registers. 

• Specify 64-bit operand size. 

• Specify extended control registers. 

Not all instructions require a REX prefix in 64-bit mode. A prefix is necessary only if an instruction references one 
of the extended registers or uses a 64-bit operand. If a REX prefix is used when it has no meaning, it is ignored. 
Only one REX prefix is allowed per instruction. If used, the REX prefix byte must immediately precede the opcode 
byte or the escape opcode byte (OFH). When a REX prefix is used in conjunction with an instruction containing a 
mandatory prefix, the mandatory prefix must come before the REX so the REX prefix can be immediately preceding 
the opcode or the escape byte. For example, CVTDQ2PD with a REX prefix should have REX placed between F3 and 
OF E6. Other placements are ignored. The instruction-size limit of 15 bytes still applies to instructions with a REX 
prefix. See Figure 2-3. 









Legacy 

Prefixes 

REX 

Prefix 

Opcode 

ModR/M 

SIB 

Displacement 

Immediate 


Grp 1, Grp 

2, Grp 3, 

Grp 4 
(optional) 

(optional) 

1 2-, or 

3-byte 

opcode 

1 byte 
(if required) 

1 byte 
(if required) 

Address 
displacement of 

1,2, or 4 bytes 

Immediate data 
of l,2,or4 
bytes or none 


Figure 2-3. Prefix Ordering in 64-bit Mode 


2.2.1.1 Encoding 

Intel 64 and IA-32 instruction formats specify up to three registers by using 3-bit fields in the encoding, depending 
on the format: 

• ModR/M: the reg and r/m fields of the ModR/M byte. 

• ModR/M with SIB: the reg field of the ModR/M byte, the base and index fields of the SIB (scale, index, base) 
byte. 

• Instructions without ModR/M: the reg field of the opcode. 

In 64-bit mode, these formats do not change. Bits needed to define fields in the 64-bit context are provided by the 
addition of REX prefixes. 

2.2.1.2 More on REX Prefix Fields 

REX prefixes are a set of 16 opcodes that span one row of the opcode map and occupy entries 40H to 4FH. These 
opcodes represent valid instructions (INC or DEC) in IA-32 operating modes and in compatibility mode. In 64-bit 
mode, the same opcodes represent the instruction prefix REX and are not treated as individual instructions. 

The single-byte-opcode forms of the INC/DEC instructions are not available in 64-bit mode. INC/DEC functionality 
is still available using ModR/M forms of the same instructions (opcodes FF/0 and FF/1). 

See Table 2-4 for a summary of the REX prefix format. Figure 2-4 though Figure 2-7 show examples of REX prefix 
fields in use. Some combinations of REX prefix fields are invalid. In such cases, the prefix is ignored. Some addi¬ 
tional information follows: 

• Setting REX.W can be used to determine the operand size but does not solely determine operand width. Like 
the 66H size prefix, 64-bit operand size override has no effect on byte-specific operations. 

• For non-byte operations: if a 66H prefix is used with prefix (REX.W = 1), 66H is ignored. 

• If a 66H override is used with REX and REX.W = 0, the operand size is 16 bits. 
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• REX.R modifies the ModR/M reg field when that field encodes a GPR, SSE, control or debug register. REX.R is 
ignored when ModR/M specifies other registers or defines an extended opcode. 

• REX.X bit modifies the SIB index field. 

• REX.B either modifies the base in the ModR/M r/m field or SIB base field; or it modifies the opcode reg field 
used for accessing GPRs. 


Table 2-4. REX Prefix Fields [BITS: OlOOWRXB] 


Field Name 

Bit Position 

Definition 

- 

7:4 

0100 

W 

3 

0 = Operand size determined by CS.D 

1 = 64 Bit Operand Size 

R 

2 

Extension of the ModR/M reg field 

X 

1 

Extension of the SIB index field 

B 

0 

Extension of the ModR/M r/m field, SIB base field, or Opcode reg field 



Rrrr Bbbb 


OM17Xfig1-3 


Figure 2-4. Memory Addressing Without an SIB Byte; REX.X Not Used 




ModRM Byte 

REX PREFIX 


Opcode 


mod 

reg 

r/m 

0100WR0B 




11 

rrr 

bbb 


tttt 

Rrrr 


tttt 

Bbbb 


Figure 2-5. Register-Register Addressing (No Memory Operand); REX.X Not Used 
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Rrrr Xxxx Bbbb 


OM17Xfig1-5 


Figure 2-6. Memory Addressing With a SIB Byte 



Figure 2-7. Register Operand Coded in Opcode Byte; REX.X & REX.R Not Used 


In the IA-32 architecture, byte registers (AH, AL, BH, BL, CH, CL, DH, and DL) are encoded in the ModR/M byte's 
reg field, the r/m field or the opcode reg field as registers 0 through 7. REX prefixes provide an additional 
addressing capability for byte-registers that makes the least-significant byte of GPRs available for byte operations 

Certain combinations of the fields of the ModR/M byte and the SIB byte have special meaning for register encod¬ 
ings. For some combinations, fields expanded by the REX prefix are not decoded. Table 2-5 describes how each 
case behaves. 
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Table 2-5. Special Cases of REX Encodings 


ModR/M or 

SIB 

Sub-field 

Encodings 

Compatibility Mode 
Operation 

Compatibility Mode 
Implications 

Additional Implications 

ModR/M Byte 

mod * 11 

SIB byte present. 

SIB byte required for 
ESP-based addressing. 

REX prefix adds a fourth bit (b) which is not decoded 
(don't care). 

SIB byte also required for R12-based addressing. 

r/m = 

b*100(ESP) 

ModR/M Byte 

mod = 0 

Base register not 
used. 

EBP without a 
displacement must be 
done using 

mod = 01 with 
displacement of 0. 

REX prefix adds a fourth bit (b) which is not decoded 
(don't care). 

Using RBP or R13 without displacement must be done 
using mod = 01 with a displacement of 0. 

r/m = 

b*101(EBP) 

SIB Byte 

index = 
OIOO(ESP) 

Index register not 
used. 

ESP cannot be used as 
an index register. 

REX prefix adds a fourth bit (b) which is decoded. 

There are no additional Implications. The expanded 
Index field allows distinguishing RSP from R12, 
therefore R12 can be used as an Index. 

SIB Byte 

base = 

0101 (EBP) 

Base register is 
unused if mod = 0. 

Base register depends 
on mod encoding. 

REX prefix adds a fourth bit (b) which is not decoded. 
This requires explicit displacement to be used with 
EBP/RBP or R13. 


NOTES: 

* Don't care about value of REX.B 


2.2.1.3 Displacement 

Addressing in 64-bit mode uses existing 32-bit ModR/M and SIB encodings. The ModR/M and SIB displacement 
sizes do not change. They remain 8 bits or 32 bits and are sign-extended to 64 bits. 

2.2.1.4 Direct Memory-Offset MOVs 

In 64-bit mode, direct memory-offset forms of the MOV instruction are extended to specify a 64-bit immediate 
absolute address. This address is called a moffset. No prefix is needed to specify this 64-bit memory offset. For 
these MOV instructions, the size of the memory offset follows the address-size default (64 bits in 64-bit mode). See 
Table 2-6. 


Table 2-6. Direct Memory Offset Form of MOV 


Opcode 

Instruction 

AO 

MOV AL, moffset 

A1 

MOV EAX, moffset 

A2 

MOV moffset, AL 

A3 

MOV moffset, EAX 


2.2.1.5 Immediates 

In 64-bit mode, the typical size of immediate operands remains 32 bits. When the operand size is 64 bits, the 
processor sign-extends all immediates to 64 bits prior to their use. 

Support for 64-bit immediate operands is accomplished by expanding the semantics of the existing move (MOV 
reg, imm 16/32) instructions. These instructions (opcodes B8H - BFH) move 16-bits or 32-bits of immediate data 
(depending on the effective operand size) into a GPR. When the effective operand size is 64 bits, these instructions 
can be used to load an immediate into a GPR. A REX prefix is needed to override the 32-bit default operand size to 
a 64-bit operand size. 

For example: 

48 B8 8877665544332211 MOV RAX,1122334455667788H 
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Z.Z.1.6 RIP-Relative Addressing 

A new addressing form, RIP-relative (relative instruction-pointer) addressing, is implemented in 64-bit mode. An 
effective address is formed by adding displacement to the 64-bit RIP of the next instruction. 

In IA-32 architecture and compatibility mode, addressing relative to the instruction pointer is available only with 
control-transfer instructions. In 64-bit mode, instructions that use ModR/M addressing can use RIP-relative 
addressing. Without RIP-relative addressing, all ModR/M modes address memory relative to zero. 

RIP-relative addressing allows specific ModR/M modes to address memory relative to the 64-bit RIP using a signed 
32-bit displacement. This provides an offset range of ±2GB from the RIP. Table 2-7 shows the ModR/M and SIB 
encodings for RIP-relative addressing. Redundant forms of 32-bit displacement-addressing exist in the current 
ModR/M and SIB encodings. There is one ModR/M encoding and there are several SIB encodings. RIP-relative 
addressing is encoded using a redundant form. 

In 64-bit mode, the ModR/M Disp32 (32-bit displacement) encoding is re-defined to be RIP-i-Disp32 rather than 
displacement-only. See Table 2-7. 


Table Z-7. RIP-Relative Addressing 


ModR/M and SIB Sub-field Encodings 

Compatibility Mode 
Operation 

64-bit Mode 
Operation 

Additional Implications in 64-bit mode 

ModR/M Byte 

mod = 00 

Disp32 

RIP + Disp32 

Must use SIB form with normal (zero-based) 
displacement addressing 

r/m = 101 (none) 

SIB Byte 

base = 101 (none) 

if mod = 00, Disp32 

Same as legacy 

None 

index = 100 (none) 

scale = 0,1,2,4 


The ModR/M encoding for RIP-relative addressing does not depend on using a prefix. Specifically, the r/m bit field 
encoding of lOlB (used to select RIP-relative addressing) is not affected by the REX prefix. For example, selecting 
R13 (REX.B = 1, r/m = lOlB) with mod = OOB still results in RIP-relative addressing. The 4-bit r/m field of REX.B 
combined with ModR/M is not fully decoded. In order to address R13 with no displacement, software must encode 
R13 -I- 0 using a 1-byte displacement of zero. 

RIP-relative addressing is enabled by 64-bit mode, not by a 64-bit address-size. The use of the address-size prefix 
does not disable RIP-relative addressing. The effect of the address-size prefix is to truncate and zero-extend the 
computed effective address to 32 bits. 

Z.Z.1.7 Default 64-Bit Operand Size 

In 64-bit mode, two groups of instructions have a default operand size of 64 bits (do not need a REX prefix for this 
operand size). These are: 

• Near branches. 

• All instructions, except far branches, that implicitly reference the RSP. 


Z.Z.Z Additional Encodings for Control and Debug Registers 

In 64-bit mode, more encodings for control and debug registers are available. The REX.R bit is used to modify the 
ModR/M reg field when that field encodes a control or debug register (see Table 2-4). These encodings enable the 
processor to address CR8-CR15 and DR8- DR15. An additional control register (CR8) is defined in 64-bit mode. CR8 
becomes the Task Priority Register (TPR). 

In the first implementation of IA-32e mode, CR9-CR15 and DR8-DR15 are not implemented. Any attempt to access 
unimplemented registers results in an invalid-opcode exception (#UD). 
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2.3 INTEL® ADVANCED VECTOR EXTENSIONS (INTEL® AVX) 

Intel AVX instructions are encoded using an encoding scheme that combines prefix bytes, opcode extension field, 
operand encoding fields, and vector length encoding capability into a new prefix, referred to as VEX. In the VEX 
encoding scheme, the VEX prefix may be two or three bytes long, depending on the instruction semantics. Despite 
the two-byte or three-byte length of the VEX prefix, the VEX encoding format provides a more compact represen¬ 
tation/packing of the components of encoding an instruction in Intel 64 architecture. The VEX encoding scheme 
also allows more headroom for future growth of Intel 64 architecture. 


2.3.1 Instruction Format 

Instruction encoding using VEX prefix provides several advantages: 

• Instruction syntax support for three operands and up-to four operands when necessary. For example, the third 
source register used by VBLENDVPD is encoded using bits 7:4 of the immediate byte. 

• Encoding support for vector length of 128 bits (using XMM registers) and 256 bits (using VMM registers). 

• Encoding support for instruction syntax of non-destructive source operands. 

• Elimination of escape opcode byte (OFH), SIMD prefix byte (66H, F2H, F3H) via a compact bit field represen¬ 
tation within the VEX prefix. 

• Elimination of the need to use REX prefix to encode the extended half of general-purpose register sets (R8- 
R15) for direct register access, memory addressing, or accessing XMM8-XMM15 (including YMM8-YMM15). 

• Flexible and more compact bit fields are provided in the VEX prefix to retain the full functionality provided by 
REX prefix. REX.W, REX.X, REX.B functionalities are provided in the three-byte VEX prefix only because only a 
subset of SIMD instructions need them. 

• Extensibility for future instruction extensions without significant instruction length increase. 

Figure 2-8 shows the Intel 64 instruction encoding format with VEX prefix support. Legacy instruction without a 

VEX prefix is fully supported and unchanged. The use of VEX prefix in an Intel 64 instruction is optional, but a VEX 

prefix is required for Intel 64 instructions that operate on YMM registers or support three and four operand syntax. 

VEX prefix is not a constant-valued, "single-purpose" byte like OFH, 66H, F2H, F3H in legacy SSE instructions. VEX 

prefix provides substantially richer capability than the REX prefix. 


# Bytes 

2,3 

1 

1 

0,1 

0,1,2,4 

0,1 


[Prefixes] 


[VEX] 


OPCODE 


ModR/M 


[SIB] 


[DISP] 


[IMM] 


Figure 2-8. Instruction Encoding Format with VEX Prefix 

2.3.2 VEX and the LOCK prefix 

Any VEX-encoded instruction with a LOCK prefix preceding VEX will #UD. 

2.3.3 VEX and the 66H, F2H, and F3H prefixes 

Any VEX-encoded instruction with a 66H, F2H, or F3H prefix preceding VEX will #UD. 

2.3.4 VEX and the REX prefix 

Any VEX-encoded instruction with a REX prefix proceeding VEX will #UD. 
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2.3.5 The VEX Prefix 

The VEX prefix is encoded in either the two-byte form (the first byte must be C5H) or in the three-byte form (the 
first byte must be C4H). The two-byte VEX is used mainly for 128-bit, scalar, and the most common 256-bit AVX 
instructions; while the three-byte VEX provides a compact replacement of REX and 3-byte opcode instructions 
(including AVX and FMA instructions). Beyond the first byte of the VEX prefix, it consists of a number of bit fields 
providing specific capability, they are shown in Figure 2-9. 

The bit fields of the VEX prefix can be summarized by its functional purposes: 

• Non-destructive source register encoding (applicable to three and four operand syntax): This is the first source 
operand in the instruction syntax. It is represented by the notation, VEX.vvvv. This field is encoded using I's 
complement form (inverted form), i.e. XMMO/YMMO/RO is encoded as llllB, XMM15/YMM15/R15 is encoded 
as OOOOB. 

• Vector length encoding: This 1-bit field represented by the notation VEX.L. L= 0 means vector length is 128 bits 
wide, L=1 means 256 bit vector. The value of this field is written as VEX.128 or VEX.256 in this document to 
distinguish encoded values of other VEX bit fields. 

• REX prefix functionality: Full REX prefix functionality is provided in the three-byte form of VEX prefix. However 
the VEX bit fields providing REX functionality are encoded using I's complement form, i.e. XMMO/YMMO/RO is 
encoded as llllB, XMM15/YMM15/R15 is encoded as OOOOB. 

— Two-byte form of the VEX prefix only provides the equivalent functionality of REX.R, using I's complement 
encoding. This is represented as VEX.R. 

— Three-byte form of the VEX prefix provides REX.R, REX.X, REX.B functionality using I's complement 
encoding and three dedicated bit fields represented as VEX.R, VEX.X, VEX.B. 

— Three-byte form of the VEX prefix provides the functionality of REX. W only to specific instructions that need 
to override default 32-bit operand size for a general purpose register to 64-bit size in 64-bit mode. For 
those applicable instructions, VEX.W field provides the same functionality as REX.W. VEX.W field can 
provide completely different functionality for other instructions. 

Consequently, the use of REX prefix with VEX encoded instructions is not allowed. However, the intent of the 
REX prefix for expanding register set is reserved for future instruction set extensions using VEX prefix 
encoding format. 

• Compaction of SIMD prefix: Legacy SSE instructions effectively use SIMD prefixes (66H, F2H, F3H) as an 
opcode extension field. VEX prefix encoding allows the functional capability of such legacy SSE instructions 
(operating on XMM registers, bits 255:128 of corresponding YMM unmodified) to be encoded using the VEX.pp 
field without the presence of any SIMD prefix. The VEX-encoded 128-bit instruction will zero-out bits 255:128 
of the destination register. VEX-encoded instruction may have 128 bit vector length or 256 bits length. 

• Compaction of two-byte and three-byte opcode: More recently introduced legacy SSE instructions employ two 
and three-byte opcode. The one or two leading bytes are: OFH, and OFH 3AH/0FH 38H. The one-byte escape 
(OFH) and two-byte escape (OFH 3AH, OFH 38H) can also be interpreted as an opcode extension field. The 
VEX.mmmmm field provides compaction to allow many legacy instruction to be encoded without the constant 
byte sequence, OFH, OFH 3AH, OFH 38H. These VEX-encoded instruction may have 128 bit vector length or 256 
bits length. 

The VEX prefix is required to be the last prefix and immediately precedes the opcode bytes. It must follow any 
other prefixes. If VEX prefix is present a REX prefix is not supported. 

The 3-byte VEX leaves room for future expansion with 3 reserved bits. REX and the 66h/F2h/F3h prefixes are 
reclaimed for future use. 

VEX prefix has a two-byte form and a three byte form. If an instruction syntax can be encoded using the two-byte 
form, it can also be encoded using the three byte form of VEX. The latter increases the length of the instruction by 
one byte. This may be helpful in some situations for code alignment. 

The VEX prefix supports 256-bit versions of floating-point SSE, SSE2, SSE3, and SSE4 instructions. Note, certain 
new instruction functionality can only be encoded with the VEX prefix. 

The VEX prefix will #UD on any instruction containing MMX register sources or destinations. 
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Byte 0 


Byte 1 


(Bit Position) 7 


0 7 6 5 4 0 


3-byte VEX 


2-byte VEX 


11000100 


RXB 

m-mmmm 


7 0 

7 

6 3 

2 

1 0 

11000101 


R 

VVVV 

L 

pp 


Byte 2 


7 

6 3 

2 

1 0 

w 

VVVV 

L 

pp 


R: REX.R in 1’ s complement (inverted) form 

1: Same as REX.R=0 (must be 1 in 32-bit mode) 
0: Same as REX.R=1 (64-bit mode only) 

X: REX.X in I’s complement (inverted) form 

1: Same as REX.X=0 (must be 1 in 32-bit mode) 
0: Same as REX.X=1 (64-bit mode only) 

B: REX.B in I’s complement (inverted) form 

1: Same as REX.B=0 (Ignored in 32-bit mode). 
0: Same as REX.B=1 (64-bit mode only) 

W: opcode specific (use like REX.W, or used for opcode 
extension, or ignored, depending on the opcode byte) 


m-mmmm: 

00000: Reserved for future use (will #UD) 

00001: implied OF leading opcode byte 
00010: implied OF 38 leading opcode bytes 
00011: implied OF 3A leading opcode bytes 
00100- mil: Reserved for future use (will #UD) 


vvvv: a register specifier (in 1 ’ s complement form) or 1111 if unused. 

L: Vector Length 

0 : scalar or 128-bit vector 
1: 256-bit vector 


pp: opcode extension providing equivalent functionality of a SIMD prefix 
00: None 
01:66 
10: F3 
11: F2 


Figure 2-9. VEX bit fields 

The following subsections describe the various fields in two or three-byte VEX prefix. 

Z.3.5.1 VEX Byte 0, bits[7:0] 

VEX Byte 0, bits [7:0] must contain the value 11000101b (C5h) or 11000100b (C4h). The 3-byte VEX uses the C4h 
first byte, while the 2-byte VEX uses the C5h first byte. 


Z.3.5.Z VEXByte l,bit[7]-'R' 

VEX Byte 1, bit [7] contains a bit analogous to a bit inverted REX.R. In protected and compatibility modes the bit 
must be set to '1' otherwise the instruction is LES or LDS. 
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This bit is present in both 2- and 3-byte VEX prefixes. 

The usage of WRXB bits for legacy instructions is explained in detail section 2.2.1.2 of Intel 64 and IA-32 Architec¬ 
tures Software developer's manual, Volume 2A. 

This bit is stored in bit inverted format. 


Z.3.5.3 3-byte VEX byte 1, bit[6] - 'X' 

Bit[6] of the 3-byte VEX byte 1 encodes a bit analogous to a bit inverted REX.X. It is an extension of the SIB Index 
field in 64-bit modes. In 32-bit modes, this bit must be set to '1' otherwise the instruction is LES or LDS. 

This bit is available only in the 3-byte VEX prefix. 

This bit is stored in bit inverted format. 


Z.3.5.4 3-byte VEX byte 1, bit[5] - 'B' 

Bit[5] of the 3-byte VEX byte 1 encodes a bit analogous to a bit inverted REX.B. In 64-bit modes, it is an extension 
of the ModR/M r/m field, or the SIB base field. In 32-bit modes, this bit is ignored. 

This bit is available only in the 3-byte VEX prefix. 

This bit is stored in bit inverted format. 


Z.3.5.5 3-byte VEX byte Z, bit[7] - 'W' 

Bit[7] of the 3-byte VEX byte 2 is represented by the notation VEX.W. It can provide following functions, depending 
on the specific opcode. 

• For AVX instructions that have equivalent legacy SSE instructions (typically these SSE instructions have a 
general-purpose register operand with its operand size attribute promotable by REX.W), if REX.W promotes 
the operand size attribute of the general-purpose register operand in legacy SSE instruction, VEX.W has same 
meaning in the corresponding AVX equivalent form. In 32-bit modes, VEX.W is silently ignored. 

• For AVX instructions that have equivalent legacy SSE instructions (typically these SSE instructions have oper¬ 
ands with their operand size attribute fixed and not promotable by REX.W), if REX.W is don't care in legacy 
SSE instruction, VEX.W is ignored in the corresponding AVX equivalent form irrespective of mode. 

• For new AVX instructions where VEX.W has no defined function (typically these meant the combination of the 
opcode byte and VEX.mmmmm did not have any equivalent SSE functions), VEX.W is reserved as zero and 
setting to other than zero will cause instruction to #UD. 

Z.3.5.6 Z-byte VEX Byte 1, bits[6:3] and 3-byte VEX Byte Z, bits [6:3]- 'vvvv' the Source or Best 
Register Specifier 

In 32-bit mode the VEX first byte C4 and C5 alias onto the LES and LDS instructions. To maintain compatibility with 
existing programs the VEX 2nd byte, bits [7:6] must be 11b. To achieve this, the VEX payload bits are selected to 
place only inverted, 64-bit valid fields (extended register selectors) in these upper bits. 

The 2-byte VEX Byte 1, bits [6:3] and the 3-byte VEX, Byte 2, bits [6:3] encode a field (shorthand VEX.vvvv) that 
for instructions with 2 or more source registers and an XMM or VMM or memory destination encodes the first source 
register specifier stored in inverted (I's complement) form. 

VEX.vvvv is not used by the instructions with one source (except certain shifts, see below) or on instructions with 
no XMM or VMM or memory destination. If an instruction does not use VEX.vvvv then it should be set to 1111b 
otherwise instruction will #UD. 

In 64-bit mode all 4 bits may be used. See Table 2-8 for the encoding of the XMM or VMM registers. In 32-bit and 
16-bit modes bit 6 must be 1 (if bit 6 is not 1, the 2-byte VEX version will generate LDS instruction and the 3-byte 
VEX version will ignore this bit). 
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Table 2-8. VEX.vvvv to register name mapping 


VEX.vvvv 

Dest Register 

Valid in Legacy/Compatibility 32-bit modes? 

1111B 

XMMO/YMMO 

Valid 

1110B 

XMM1/YMM1 

Valid 

1101B 

XMM2/YMM2 

Valid 

1100B 

XMM3/YMM3 

Valid 

1011B 

XMM4/YMM4 

Valid 

101 OB 

XMM5/YMM5 

Valid 

1001B 

XMM6/YMM6 

Valid 

1000B 

XMM7/YMM7 

Valid 

0111B 

XMMB/YMM8 

Invalid 

oil OB 

XMM9/YMM9 

Invalid 

0101B 

XMM10/YMM10 

Invalid 

0100B 

XMM11/YMM11 

Invalid 

001 IB 

XMM12/YMM12 

Invalid 

001 OB 

XMM13/YMM13 

Invalid 

0001B 

XMM14/YMM14 

Invalid 

OOOOB 

XMM15/YMM15 

Invalid 


The VEX.vvvv field is encoded in bit inverted format for accessing a register operand. 


2.3.6 Instruction Operand Encoding and VEX.vvvv, ModR/M 

VEX-encoded instructions support three-operand and four-operand instruction syntax. Some VEX-encoded 
instructions have syntax with less than three operands, e.g. VEX-encoded pack shift instructions support one 
source operand and one destination operand). 

The roles of VEX.vvvv, reg field of ModR/M byte (ModR/M.reg), r/m field of ModR/M byte (ModR/M.r/m) with 
respect to encoding destination and source operands vary with different type of instruction syntax. 

The role of VEX.vvvv can be summarized to three situations: 

• VEX.vvvv encodes the first source register operand, specified in inverted (I's complement) form and is valid for 
instructions with 2 or more source operands. 

• VEX.vvvv encodes the destination register operand, specified in I's complement form for certain vector shifts. 
The instructions where VEX.vvvv is used as a destination are listed in Table 2-9. The notation in the "Opcode" 
column in Table 2-9 is described in detail in section 3.1.1. 

• VEX.vvvv does not encode any operand, the field is reserved and should contain 1111b. 


Table 2-9. Instructions with a VEX.vvvv destination 


Opcode 

Instruction mnemonic 

VEX.NDD.128.66.0F73/7 ib 

VPSLLDQ xmmi, xmm2, immB 

VEX.NDD.128.66.0F73/3 ib 

VPSRLDQ xmmi, xmm2, immB 

VEX.NDD.128.66.0F71 /2 ib 

VPSRLW xmmi, xmm2, immB 

VEX.NDD.128.66.0F72/2 ib 

VPSRLD xmmi, xmm2, ImmB 

VEX.NDD.128.66.0F73/2 ib 

VPSRLQ xmmi, xmm2, ImmB 

VEX.NDD.128.66.0F71 /4 ib 

VPSRAW xmmi, xmm2, immB 

VEX.NDD.128.66.0F72/4 ib 

VPSRAD xmmi, xmm2, immB 

VEX.NDD.128.66.0F71 /6 ib 

VPSLLW xmmi, xmm2, ImmB 

VEX.NDD.128.66.0F72/6 ib 

VPSLLD xmmi, xmm2, ImmB 

VEX.NDD.128.66.0F73/6 ib 

VPSLLQ xmm 1, xmm2, ImmB 
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The role of ModR/M.r/m field can be summarized to two situations: 

• ModR/M.r/m encodes the instruction operand that references a memory address. 

• For some instructions that do not support memory addressing semantics, ModR/M.r/m encodes either the 
destination register operand or a source register operand. 

The role of ModR/M.reg field can be summarized to two situations: 

• ModR/M.reg encodes either the destination register operand or a source register operand. 

• For some instructions, ModR/M.reg is treated as an opcode extension and not used to encode any instruction 
operand. 

For instruction syntax that support four operands, VEX.vvvv, ModR/M.r/m, ModR/M.reg encodes three of the four 
operands. The role of bits 7:4 of the immediate byte serves the following situation: 

• Imm8[7:4] encodes the third source register operand. 

Z.3.6.1 3-byte VEX byte 1, bits[4:0] - "m-mmmm" 

Bits[4:0] of the 3-byte VEX byte 1 encode an implied leading opcode byte (OF, OF 38, or OF 3A). Several bits are 
reserved for future use and will #UD unless 0. 


Table 2-10. VEX.m-mmmm interpretation 


VEX.m-mmnini 

Implied Leading Opcode Bytes 

OOOOOB 

Reserved 

0000IB 

OF 

0001 OB 

OF 38 

0001 IB 

OF 3A 

00100-1111 IB 

Reserved 

(2-byte VEX) 

OF 


VEX.m-mmmm is only available on the 3-byte VEX. The 2-byte VEX implies a leading OFh opcode byte. 

Z.3.6.Z Z-byte VEX byte 1, bit[Z], and 3-byte VEX byte Z, bit [Z]- "L" 

The vector length field, VEX.L, is encoded in bit[2] of either the second byte of 2-byte VEX, or the third byte of 3- 
byte VEX. If "VEX.L = 1", it indicates 256-bit vector operation. "VEX.L = 0" indicates scalar and 128-bit vector 
operations. 

The instruction VZEROUPPER is a special case that is encoded with VEX.L = 0, although its operation zero's bits 
255:128 of all VMM registers accessible in the current operating mode. 

See the following table. 


Table 2-11. VEX.L interpretation 


VEX.L 

Vector Length 

0 

128-bit (or 32/64-bit scalar) 

1 

256-bit 


2.3.6.3 Z-byte VEX byte 1, bits[l :0], and 3-byte VEX byte Z, bits [1:0]- "pp" 

Up to one implied prefix is encoded by bits[l:0] of either the 2-byte VEX byte 1 or the 3-byte VEX byte 2. The prefix 
behaves as if it was encoded prior to VEX, but after all other encoded prefixes. 

See the following table. 
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Table 2-12. VEX.pp interpretation 


pp 

Implies this prefix after other prefixes but before VEX 

OOB 

None 

01B 

66 

10B 

F3 

11B 

F2 


2.3.7 The Opcode Byte 

One (and only one) opcode byte follows the 2 or 3 byte VEX. Legal opcodes are specified in Appendix B, in color. 
Any instruction that uses illegal opcode will #UD. 


2.3.8 The MODRM, SIB, and Displacement Bytes 

The encodings are unchanged but the interpretation of reg_field or rm_field differs (see above). 


2.3.9 The Third Source Operand (Immediate Byte) 

VEX-encoded instructions can support instruction with a four operand syntax. VBLENDVPD, VBLENDVPS, and 
PBLENDVB use imm8[7:4] to encode one of the source registers. 


2.3.10 AVX Instructions and the Upper 128-bits of YMM registers 

If an instruction with a destination XMM register is encoded with a VEX prefix, the processor zeroes the upper bits 
(above bit 128) of the equivalent YMM register. Legacy SSE instructions without VEX preserve the upper bits. 

2.3.10.1 Vector Length Transition and Programming Considerations 

An instruction encoded with a VEX. 128 prefix that loads a YMM register operand operates as follows: 

• Data is loaded into bits 127:0 of the register 

• Bits above bit 127 in the register are cleared. 

Thus, such an instruction clears bits 255:128 of a destination YMM register on processors with a maximum vector- 
register width of 256 bits. In the event that future processors extend the vector registers to greater widths, an 
instruction encoded with a VEX.128 or VEX.256 prefix will also clear any bits beyond bit 255. (This is in contrast 
with legacy SSE instructions, which have no VEX prefix; these modify only bits 127:0 of any destination register 
operand.) 

Programmers should bear in mind that instructions encoded with VEX. 128 and VEX.256 prefixes will clear any 
future extensions to the vector registers. A calling function that uses such extensions should save their state before 
calling legacy functions. This is not possible for involuntary calls (e.g., into an interrupt-service routine). It is 
recommended that software handling involuntary calls accommodate this by not executing instructions encoded 
with VEX. 128 and VEX.256 prefixes. In the event that it is not possible or desirable to restrict these instructions, 
then software must take special care to avoid actions that would, on future processors, zero the upper bits of 
vector registers. 

Processors that support further vector-register extensions (defining bits beyond bit 255) will also extend the 
XSAVE and XRSTOR instructions to save and restore these extensions. To ensure forward compatibility, software 
that handles involuntary calls and that uses instructions encoded with VEX. 128 and VEX.256 prefixes should first 
save and then restore the vector registers (with any extensions) using the XSAVE and XRSTOR instructions with 
save/restore masks that set bits that correspond to all vector-register extensions. Ideally, software should rely on 
a mechanism that is cognizant of which bits to set. (E.g., an OS mechanism that sets the save/restore mask bits 
for all vector-register extensions that are enabled in XCRO.) Saving and restoring state with instructions other than 
XSAVE and XRSTOR will, on future processors with wider vector registers, corrupt the extended state of the vector 
registers - even if doing so functions correctly on processors supporting 256-bit vector registers. (The same is true 
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if XSAVE and XRSTOR are used with a save/restore mask that does not set bits corresponding to all supported 
extensions to the vector registers.) 


2.3.11 AVX Instruction Length 

The AVX instructions described in this document (including VEX and ignoring other prefixes) do not exceed 11 
bytes in length, but may increase in the future. The maximum length of an Intel 64 and IA-32 instruction remains 
15 bytes. 


2.3.12 Vector SIB (VSIB) Memory Addressing 

In Intel® Advanced Vector Extensions 2 (Intel® AVX2), an SIB byte that follows the ModR/M byte can support VSIB 
memory addressing to an array of linear addresses. VSIB addressing is only supported in a subset of Intel AVX2 
instructions. VSIB memory addressing requires 32-bit or 64-bit effective address. In 32-bit mode, VSIB addressing 
is not supported when address size attribute is overridden to 16 bits. In 16-bit protected mode, VSIB memory 
addressing is permitted if address size attribute is overridden to 32 bits. Additionally, VSIB memory addressing is 
supported only with VEX prefix. 

In VSIB memory addressing, the SIB byte consists of: 

• The scale field (bit 7:6) specifies the scale factor. 

• The index field (bits 5:3) specifies the register number of the vector index register, each element in the vector 
register specifies an index. 

• The base field (bits 2:0) specifies the register number of the base register. 

Table 2-3 shows the 32-bit VSIB addressing form. It is organized to give 256 possible values of the SIB byte (in 
hexadecimal). General purpose registers used as a base are indicated across the top of the table, along with corre¬ 
sponding values for the SIB byte's base field. The register names also include R8L-R15L applicable only in 64-bit 
mode (when address size override prefix is used, but the value of VEX.B is not shown in Table 2-3). In 32-bit mode, 
R8L-R15L does not apply. 

Table rows in the body of the table indicate the vector index register used as the index field and each supported 
scaling factor shown separately. Vector registers used in the index field can be XMM or VMM registers. The left¬ 
most column includes vector registers VR8-VR15 (i.e. XMM8/YMM8-XMM15/YMM15), which are only available in 
64-bit mode and does not apply if encoding in 32-bit mode. 


Table 2-13. 32-Bit VSIB Addressing Forms of the SIB Byte 


r32 


EAX/ 

ECX/ 

EDX/ 

EBX/ 

ESP/ 

EBP/ 

ESI/ 

EDI/ 



R8L 

R9L 

R10L 

R11L 

R12L 

R13U 

R14L 

R15L 

(In decimal) Base = 


0 

1 

2 

3 

4 

5 

6 

7 

(In binary) Base = 


000 

001 

010 

oil 

100 

101 

110 

111 

Scaled Index 

ss 

Index 

Value of SIB Byte (in Hexadecimal) 

VR0/VR8 

*1 

00 

000 

00 

01 

02 

03 

04 

05 

06 

07 

VR1/VR9 



001 

08 

09 

OA 

OB 

OC 

OD 

OE 

OF 

VR2/VR10 



010 

10 

11 

12 

13 

14 

15 

16 

17 

VR3/VR11 



oil 

18 

19 

1A 

IB 

1C 

ID 

IE 

IF 

VR4/VR1 2 



100 

20 

21 

22 

23 

24 

25 

26 

27 

VR5/VR1 3 



101 

28 

29 

2A 

2B 

2C 

2D 

2E 

2F 

VR6/VR14 



110 

30 

31 

32 

33 

34 

35 

36 

37 

VR7/VR15 



111 

38 

39 

3A 

3B 

3C 

3D 

3E 

3F 

VR0/VR8 

*2 

01 

000 

40 

41 

42 

43 

44 

45 

46 

47 

VR1/VR9 



001 

48 

49 

4A 

4B 

4C 

4D 

4E 

4F 

VR2/VR10 



010 

50 

51 

52 

53 

54 

55 

56 

57 

VR3/VR11 



oil 

58 

59 

5A 

5B 

5C 

5D 

5E 

5F 

VR4/VR1 2 



100 

60 

61 

62 

63 

64 

65 

66 

67 

VR5/VR1 3 



101 

68 

69 

6A 

6B 

6C 

6D 

6E 

6F 

VR6/VR14 



110 

70 

71 

72 

73 

74 

75 

76 

77 

VR7/VR15 



111 

78 

79 

7A 

7B 

7C 

7D 

7E 

7F 
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Table Z-1 

3. 3Z-BitVSIBAd 

dressing 

Forms of 

the SIB Byte (Contd.) 



VR0/VR8 

*4 

10 

000 

80 

81 

82 

83 

84 

85 

86 

87 

VR1/VR9 



001 

88 

89 

8A 

88 

8C 

8D 

8E 

8F 

VR2/VR10 



010 

90 

91 

92 

93 

94 

95 

96 

97 

VR3/VR11 



oil 

98 

89 

9A 

98 

9C 

9D 

9E 

9F 

VR4/VR12 



100 

AO 

A1 

A2 

A3 

A4 

A5 

A6 

A7 

VR5/VR13 



101 

A8 

A9 

AA 

A8 

AC 

AD 

AE 

AF 

VR6/VR14 



110 

80 

81 

82 

83 

84 

85 

86 

87 

VR7/VR15 



111 

88 

89 

8A 

88 

8C 

8D 

8E 

8F 

VR0/VR8 

*8 

11 

000 

CO 

Cl 

C2 

C3 

C4 

C5 

C6 

C7 

VR1/VR9 



001 

C8 

C9 

CA 

C8 

CC 

CD 

CE 

CF 

VR2/VR10 



010 

DO 

D1 

D2 

D3 

D4 

D5 

D6 

D7 

VR3/VR11 



oil 

D8 

D9 

DA 

D8 

DC 

DD 

DE 

DF 

VR4/VR12 



100 

EO 

El 

E2 

E3 

E4 

E5 

E6 

E7 

VR5/VR1 3 



101 

E8 

E9 

EA 

E8 

EC 

ED 

EE 

EF 

VR6/VR14 



110 

FO 

FI 

F2 

F3 

F4 

F5 

F6 

F7 

VR7/VR15 



111 

F8 

F9 

FA 

F8 

FC 

FD 

FE 

FF 


NOTES: 

1. If ModR/M.mod = 00b, the base address Is zero, then effective address is computed as [scaled vector index] + disp32. Otherwise the 
base address is computed as [EBP/R13]+ disp, the displacement is either 8 bit or 32 bit depending on the value of ModR/M.mod: 

MOD Effective Address 

00b [Scaled Vector Register] + Dlsp32 

01b [Scaled Vector Register] + Dlsp8 + [E8P/R13] 

10b [Scaled Vector Register] + Dlsp32 + [E8P/R13] 


Z.3.1 Z.l 64-bit Mode VSIB Memory Addressing 

In 64-bit mode VSIB memory addressing uses the VEX.B field and the base field of the SIB byte to encode one of 
the 16 general-purpose register as the base register. The VEX.X field and the index field of the SIB byte encode one 
of the 16 vector registers as the vector index register. 

In 64-bit mode the top row of Table 2-13 base register should be interpreted as the full 64-bit of each register. 


2.4 AVX AND SSE INSTRUCTION EXCEPTION SPECIFICATION 

To look up the exceptions of legacy 128-bit SIMD instruction, 128-bit VEX-encoded instructions, and 256-bit VEX- 
encoded instruction. Table 2-14 summarizes the exception behavior into separate classes, with detailed exception 
conditions defined in sub-sections 2.4.1 through 2.5.1. For example, ADDPS contains the entry: 

"See Exceptions Type 2" 

In this entry, "Type2" can be looked up in Table 2-14. 

The instruction's corresponding CPUID feature flag can be identified in the fourth column of the Instruction 
summary table. 

Note: #UD on CPUID feature flags=0 is not guaranteed in a virtualized environment if the hardware supports the 
feature flag. 


NOTE 

Instructions that operate only with MMX, X87, or general-purpose registers are not covered by the 
exception classes defined in this section. For instructions that operate on MMX registers, see 
Section 22.25.3, "Exception Conditions of Legacy SIMD Instructions Operating on MMX Registers" 
in the Intel® 64 and IA-32 Architectures Software Developer's Manual, Volume 3B. 
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Table 2-14. Exception class description 


Exception Class 

Instruction set 

Mem arg 

Floating-Point 
Exceptions (#XM) 

Type 1 

AVX, 

Legacy SSE 

16/32 byte explicitly 
aligned 

None 

Type 2 

AVX, 

Legacy SSE 

16/32 byte not explicitly 
aligned 

Yes 

Type 3 

AVX, 

Legacy SSE 

< 16 byte 

Yes 

Type 4 

AVX, 

Legacy SSE 

16/32 byte not explicitly 
aligned 

No 

Type 5 

AVX, 

Legacy SSE 

< 16 byte 

No 

Type 6 

AVX (no Legacy SSE) 

Varies 

(At present, none do) 

Type 7 

AVX, 

Legacy SSE 

None 

None 

Type 8 

AVX 

None 

None 

Type 11 

F16C 

8 or 16 byte. Not explicitly 
aligned, no AC# 

Yes 

Type 12 

AVX2 

Not explicitly aligned, no 
AC# 

No 


See Table 2-15 for lists of instructions in each exception class. 
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Table 2-15. Instructions in each Exception Class 

Exception Class 

Instruction 

Type 1 

(V)MOVAPD, (V)MOVAPS, (V)MOVDQA, (V)MOVNTDQ, (V)MOVNTDQA, (V)MOVNTPD, (V)MOVNTPS 

Type 2 

(V)ADDPD, (V)ADDPS, (V)ADDSUBPD, (V)ADDSUBPS, (V)CMPPD, (V)CMPPS, (V)CVTDQ2PS, (V)CVTPD2DQ, 
(V)CVTPD2PS, (V)CVTPS2DQ, (V)CVTTPD2DQ, (V)CVTTPS2DQ, (V)DIVPD, (V)DIVPS, (V)DPPD*, (V)DPPS*, 
VFMADD132PD, VFMADD21 3PD, VFMADD231 PD, VFMADD132PS, VFMADD21 3PS, VFMADD231 PS, 
VFMADDSUB1 32PD, VFMADDSUB213PD, VFMADDSUB231 PD, VFMADDSUB132PS, VFMADDSUB213PS, 
VFMADDSUB231 PS, VFMSUBADD132PD, VFMSUBADD21 3PD, VFMSUBADD231 PD, VFMSUBADD132PS, 
VFMSUBADD213PS, VFMSUBADD231 PS, VFMSUB132PD, VFMSUB213PD, VFMSUB231 PD, VFMSUB132PS, 
VFMSUB213PS, VFMSUB231 PS, VFNMADD132PD, VFNMADD213PD, VFNMADD231 PD, VFNMADD132PS, 
VFNMADD213PS, VFNMADD231 PS, VFNMSUB132PD, VFNMSUB21 3PD, VFNMSUB231 PD, VFNMSUB132PS, 
VFNMSUB21 3PS, VFNMSUB231 PS, (V)HADDPD, (V)HADDPS, (V)HSUBPD, (V)HSUBPS, (V)MAXPD, (V)MAXPS, 
(V)MINPD, (V)MINPS, (V)MULPD, (V)MULPS, (V)ROUNDPS, (V)SQRTPD, (V)SQRTPS, (V)SUBPD, (V)SUBPS 

Type 3 

(V)ADDSD, (V)ADDSS, (V)CMPSD, (V)CMPSS, (V)COMISD, (V)COMISS, (V)CVTPS2PD, (V)CVTSD2SI, (V)CVTSD2SS, 
(V)CVTSI2SD, (V)CVTSI2SS, (V)CVTSS2SD, (V)CVTSS2SI, (V)CVTTSD2SI, (V)CVTTSS2SI, (V)DIVSD, (V)DIVSS, 
VFMADD132SD, VFMADD213SD, VFMADD231SD, VFMADD132SS, VFMADD213SS, VFMADD231SS, 

VFMSUB132SD, VFMSUB213SD, VFMSUB231 SD, VFMSUB132SS, VFMSUB213SS, VFMSUB231 SS, 

VFNMADD132SD, VFNMADD213SD, VFNMADD231 SD, VFNMADD132SS, VFNMADD213SS, VFNMADD231 SS, 
VFNMSUB132SD, VFNMSUB213SD, VFNMSUB231 SD, VFNMSUB1 32SS, VFNMSUB213SS, VFNMSUB231 SS, 
(V)MAXSD, (V)MAXSS, (V)MINSD, (V)MINSS, (V)MULSD, (V)MULSS, (V)ROUNDSD, (V)ROUNDSS, (V)SQRTSD, 
(V)SQRTSS, (V)SUBSD, (V)SUBSS, (V)UCOMISD, (V)UCOMISS 

Type 4 

(V)AESDEC, (V)AESDECLAST, (V)AESENC, (V)AESENCLAST, (V)AESIMC, (V)AESKEYGENASSIST, (V)ANDPD, 
(V)ANDPS, (V)ANDNPD, (V)ANDNPS, (V)BLENDPD, (V)BLENDPS, VBLENDVPD, VBLENDVPS, (V)LDDQU***, 
(V)MASKMOVDQU, (V)PTEST, VTESTPS, VTESTPD, (V)MOVDQU*, (V)MOVSHDUP, (V)MOVSLDUP, (V)MOVUPD*, 
(V)MOVUPS*, (V)MPSADBW, (V)ORPD, (V)ORPS, (V)PABSB, (V)PABSW, (V)PABSD, (V)PACKSSWB, (V)PACKSSDW, 
(V)PACKUSWB, (V)PACKUSDW, (V)PADDB, (V)PADDW, (V)PADDD, (V)PADDQ, (V)PADDSB, (V)PADDSW, 

(V)PADDUSB, (V)PADDUSW, (V)PALIGNR, (V)PAND, (V)PANDN, (V)PAVGB, (V)PAVGW, (V)PBLENDVB, 

(V)PBLENDW, (V)PCMP(E/I)STRI/M***, (V)PCMPEQB, (V)PCMPEQW, (V)PCMPEQD, (V)PCMPEQQ, (V)PCMPGTB, 
(V)PCMPGTW, (V)PCMPGTD, (V)PCMPGTQ, (V)PCLMULQDQ, (V)PHADDW, (V)PHADDD, (V)PHADDSW, 
(V)PHMINPOSUW, (V)PHSUBD, (V)PHSUBW, (V)PHSUBSW, (V)PMADDWD, (V)PMADDUBSW, (V)PMAXSB, 
(V)PMAXSW, (V)PMAXSD, (V)PMAXUB, (V)PMAXUW, (V)PMAXUD, (V)PMINSB, (V)PMINSW, (V)PMINSD, 

(V)PMINUB, (V)PMINUW, (V)PMINUD, (V)PMULHUW, (V)PMULHRSW, (V)PMULHW, (V)PMULLW, (V)PMULLD, 
(V)PMULUDQ, (V)PMULDQ, (V)POR, (V)PSADBW, (V)PSHUFB, (V)PSHUFD, (V)PSHUFHW, (V)PSHUFLW, (V)PSIGNB, 
(V)PSIGNW, (V)PSIGND, (V)PSLLW, (V)PSLLD, (V)PSLLQ, (V)PSRAW, (V)PSRAD, (V)PSRLW, (V)PSRLD, (V)PSRLQ, 
(V)PSUBB, (V)PSUBW, (V)PSUBD, (V)PSUBQ, (V)PSUBSB, (V)PSUBSW, (V)PUNPCKHBW, (V)PUNPCKHWD, 
(V)PUNPCKHDQ, (V)PUNPCKHQDQ, (V)PUNPCKLBW, (V)PUNPCKLWD, (V)PUNPCKLDQ, (V)PUNPCKLQDQ, 

(V)PXOR, (V)RCPPS, (V)RSQRTPS, (V)SHUFPD, (V)SHUFPS, (V)UNPCKHPD, (V)UNPCKHPS, (V)UNPCKLPD, 
(V)UNPCKLPS, (V)XORPD, (V)XORPS, VPBLENDD, VPERMD, VPERMPS, VPERMPD, VPERMQ, VPSLLVD, VPSLLVQ, 
VPSRAVD, VPSRLVD, VPSRLVQ, VPERMILPD, VPERMILPS, VPERM2F128 

Type 5 

(V)CVTDQ2PD, (V)EXTRACTPS, (V)INSERTPS, (V)MOVD, (V)MOVQ, (V)MOVDDUP, (V)MOVLPD, (V)MOVLPS, 
(V)MOVHPD, (V)MOVHPS, (V)MOVSD, (V)MOVSS, (V)PEXTRB, (V)PEXTRD, (V)PEXTRW, (V)PEXTRQ, (V)PINSRB, 
(V)PINSRD, (V)PINSRW, (V)PINSRQ, (V)RCPSS, (V)RSQRTSS, (V)PMOVSX/ZX, VLDMXCSR*, VSTMXCSR 

Type 6 

VEXTRACTF128, VBROADCASTSS, VBROADCASTSD, VBROADCASTF1 28, VINSERTF128, VMASKMOVPS**, 
VMASKMOVPD**, VPMASKMOVD, VPMASKMOVQ, VBROADCASTI128, VPBROADCASTB, VPBROADCASTD, 
VPBROADCASTW, VPBROADCASTQ, VEXTRACTI128, VINSERTI128, VPERM2I128 

Type 7 

(V)MOVLHPS, (V)MOVHLPS, (V)MOVMSKPD, (V)MOVMSKPS, (V)PMOVMSKB, (V)PSLLDQ, (V)PSRLDQ, (V)PSLLW, 
(V)PSLLD, (V)PSLLQ, (V)PSRAW, (V)PSRAD, (V)PSRLW, (V)PSRLD, (V)PSRLQ 

Type 8 

VZEROALL, VZEROUPPER 

Type 11 

VCVTPH2PS, VCVTPS2PH 

Type 12 

VGATHERDPS, VGATHERDPD, VGATHERQPS, VGATHERQPD, VPGATHERDD, VPGATHERDQ, VPGATHERQD, 
VPGATHERQQ 


(*) - Additional exception restrictions are present - see the Instruction description for details 

(**) - Instruction behavior on alignment check reporting with mask bits of less than all 1 s are the same as with mask bits of all 1 s, I.e. no 
alignment checks are performed. 


Vol. 2A 2-23 














INSTRUCTION FORMAT 


(***) - PCMPESTRI, PCMPESTRM, PCMPISTRI, PCMPISTRM and LDDQU instructions do not cause #GP if the memory operand is not 
aligned to 16-Byte boundary. 

Table 2-15 classifies exception behaviors for AVX instructions. Within each class of exception conditions that are 
listed in Table 2-18 through Table 2-27, certain subsets of AVX instructions may be subject to #UD exception 
depending on the encoded value of the VEX.L field. Table 2-17 provides supplemental information of AVX instruc¬ 
tions that may be subject to #UD exception if encoded with incorrect values in the VEX.W or VEX.L field. 


Table 2-16. #UD Exception and \/EX.W=1 Encoding 


Exception Class 

#UD If VEX.W = 1 in all modes 

#UD If VEX.W = 1 in 
non-64-bit modes 

Type 1 



Type 2 



Type 3 



Type 4 

VBLENDVPD, VBLENDVPS, VPBLENDVB, VTESTPD, VTESTPS, VPBLENDD, VPERMD, 
VPERMPS, VPERM2I128, VPSRAVD, VPERMILPD, VPERMILPS, VPERM2F128 


Type 5 


VPEXTRQ, VPINSRQ, 

Type 6 

VEXTRACTF128, VBROADCASTSS, VBROADCASTSD, VBR0ADCASTF128, 

VINSERTF128, VMASKMOVPS, VMASKMOVPD, VBR0ADCASTI1 28, 
VPBROADCASTB/W/D, VEXTRACTI1 28, VINSERTI128 


Type 7 



Type 8 



Type 11 

VCVTPH2PS, VCVTPS2PH 


Type 12 
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Table 2-17. #UD Exception and VEX.L Field Encoding 


Exception 

Class 

#UD If VEX.L = 0 

#UD If (VEX.L = 1 && AVX2 not present && AVX 
present) 

#UD If (VEX.L = 1 && AVX2 
present) 

Type 1 


VMOVNTDQA 


Type 2 


VDPPD 

VDPPD 

Type 3 




Type 4 


VMASKMOVDQU, VMPSADBW, VPABSB/W/D, 
VPACKSSWB/DW, VPACKUSWB/DW, VPADDB/W/D, 

VPADDQ, VPADDSB/W, VPADDUSB/W, VPALIGNR, VPAND, 
VPANDN, VPAVGB/W, VPBLENDVB, VPBLENDW, 
VPCMP(E/I)STRI/M, VPCMPEQB/W/D/Q, VPCMPGTB/W/D/Q, 
VPHADDW/D, VPHADDSW, VPHMINPOSUW, VPHSUBD/W, 
VPHSUBSW, VPMADDWD, VPMADDUBSW, VPMAXSB/W/D, 
VPMAXUB/W/D, VPMINSB/W/D, VPMINUB/W/D, 

VPMULHUW, VPMULHRSW, VPMULHW/LW, VPMULLD, 
VPMULUDQ, VPMULDQ, VPOR, VPSADBW, VPSHUFB/D, 
VPSHUFHW/LW, VPSIGNB/W/D, VPSLLW/D/Q, VPSRAW/D, 
VPSRLW/D/Q, VPSUBB/W/D/Q, VPSUBSB/W, 
VPUNPCKHBW/WD/DQ, VPUNPCKHQDQ, 
VPUNPCKLBW/WD/DQ, VPUNPCKLQDQ, VPXOR 

VPCMP(E/I)STRI/M, 

PHMINPOSUW 

Type 5 


VEXTRACTPS, VINSERTPS, VMOVD, VMOVQ, VMOVLPD, 
VMOVLPS, VMOVHPD, VMOVHPS, VPEXTRB, VPEXTRD, 
VPEXTRW, VPEXTRQ, VPINSRB, VPINSRD, VPINSRW, 
VPINSRQ, VPMOVSX/ZX, VLDMXCSR, VSTMXCSR 

Same as column 3 

Type 6 

VEXTRACTF128, 

VPERM2F128, 

VBROADCASTSD, 

VBROADCASTF128, 

VINSERTF128, 



Type 7 


VMOVLHPS, VMOVHLPS, VPMOVMSKB, VPSLLDQ, 

VPSRLDQ, VPSLLW, VPSLLD, VPSLLQ, VPSRAW, VPSRAD, 
VPSRLW, VPSRLD, VPSRLQ 

VMOVLHPS, VMOVHLPS 

Type 8 




Type 11 




Type 12 
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2.4.1 Exceptions Type 1 (Aligned memory reference) 


Table 2-18. Type 1 Class Exception Conditions 


Exception 

(Q 

Q) 

\/irtual-8086 

Protected and 

Compatibility 

64-bit 

Cause of Exception 

Invalid Opcode, 

#UD 

X 

X 



VEX prefix. 



X 

X 

VEX prefix: 

If XCR0[2:1]?t'11b'. 

If CR4.0SXSAVE[bit 18]=0. 

X 

X 

X 

X 

Legacy SSE instruction: 

If CR0.EM[bit 2] = 1. 

If CR4.0SFXSR[bit 9] = 0. 

X 

X 

X 

X 

If preceded by a LOCK prefix (FOFI). 



X 

X 

If any REX, F2, F3, or 66 prefixes precede a VEX prefix. 

X 

X 

X 

X 

If any corresponding CPUID feature flag is 'O'. 

Device Not Avail¬ 
able, #NM 

X 

X 

X 

X 

If CR0.TS[bit 3]=1. 

Stack, SS(0) 



X 


For an illegal address in the SS segment. 




X 

If a memory address referencing the SS segment is in a non-canonical form. 

General Protec¬ 
tion, #GP(0) 



X 

X 

VEX.256: Memory operand is not 32-byte aligned. 

VEX.128: Memory operand is not 16-byte aligned. 

X 

X 

X 

X 

Legacy SSE: Memory operand is not 16-byte aligned. 



X 


For an illegal memory operand effective address in the CS, DS, ES, FS or GS seg¬ 
ments. 




X 

If the memory address is in a non-canonical form. 

X 

X 



If any part of the operand lies outside the effective address space from 0 to FFFFH. 

Page Fault 
#PF(fault-code) 


X 

X 

X 

For a page fault. 
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2.4.2 Exceptions Type 2 (>=16 Byte Memory Reference, Unaligned) 


Table 2-19. Type 2 Class Exception Conditions 


Exception 

fO 

0) 

oc 

Virtuai 8086 

Protected and 

Compatibiiity 

64-bit 

Cause of Exception 

Invalid Opcode, 
#UD 

X 

X 



VEX prefix. 

X 

X 

X 

X 

If an unmasked SIMD floating-point exception and CR4.0SXMMEXCPT[bit 10] = 0. 



X 

X 

VEX prefix: 

If XCR0[2:1]?t'11b'. 

If CR4.0SXSAVE[bit 18]=0. 

X 

X 

X 

X 

Legacy SSE instruction: 
lfCRO.EM[bit 2]= 1. 

If CR4.0SFXSR[bit 9] = 0. 

X 

X 

X 

X 

If preceded by a LOCK prefix (FOFI). 



X 

X 

If any REX, F2, F3, or 66 prefixes precede a VEX prefix. 

X 

X 

X 

X 

If any corresponding CPUID feature flag is 'O'. 

Device Not Avail¬ 
able, #NM 

X 

X 

X 

X 

If CRO.TS[bit 3]=1. 

Stack, SS(0) 



X 


For an illegal address in the SS segment. 




X 

If a memory address referencing the SS segment is in a non-canonical form. 

General Protec¬ 
tion, #GP(0) 

X 

X 

X 

X 

Legacy SSE: Memory operand is not 16-byte aligned. 



X 


For an illegal memory operand effective address in the CS, DS, ES, FS or GS segments. 




X 

If the memory address is in a non-canonical form. 

X 

X 



If any part of the operand lies outside the effective address space from 0 to FFFFH. 

Page Fault 
#PF(fault-code) 


X 

X 

X 

For a page fault. 

SIMD Floating¬ 
point Exception, 
#XM 

X 

X 

X 

X 

If an unmasked SIMD floating-point exception and CR4.0SXMMEXCPT[bit 10] = 1. 
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2.4.3 Exceptions Type 3 (< 16 Byte memory argument) 


Tab 

e 2-20. Type 3 Class Exception Conditions 

Exception 

fO 

a> 

oc 

Virtual-SOSG 

Protected and 

Compatibility 

64-bit 

Cause of Exception 

Invalid Opcode, #UD 

X 

X 



VEX prefix. 

X 

X 

X 

X 

If an unmasked SIMD floating-point exception and CR4.0SXMMEXCPT[bit 10] = 0. 



X 

X 

VEX prefix: 

If XCR0[2:1]?t'11b'. 

If CR4.0SXSAVE[bit 18]=0. 

X 

X 

X 

X 

Legacy SSE instruction: 

If CR0.EM[bit 2] = 1. 

If CR4.0SFXSR[bit 9] = 0. 

X 

X 

X 

X 

If preceded by a LOCK prefix (FOFI). 



X 

X 

If any REX, F2, F3, or 66 prefixes precede a VEX prefix. 

X 

X 

X 

X 

If any corresponding CPUID feature flag is 'O'. 

Device Not Available, 
#NM 

X 

X 

X 

X 

If CR0.TS[bit 3]=1. 

Stack, SS(0) 



X 


For an illegal address in the SS segment. 




X 

If a memory address referencing the SS segment is in a non-canonical form. 

General Protection, 
#GP(0) 



X 


For an illegal memory operand effective address in the CS, DS, ES, FS or GS seg¬ 
ments. 




X 

If the memory address is in a non-canonical form. 

X 

X 



If any part of the operand lies outside the effective address space from 0 to 

FFFFH. 

Page Fault 
#PF(fault-code) 


X 

X 

X 

For a page fault. 

Alignment Check 
#AC(0) 


X 

X 

X 

If alignment checking is enabled and an unaligned memory reference of 8 Bytes or 
less is made while the current privilege level is 3. 

SIMD Floating-point 
Exception, #XM 

X 

X 

X 

X 

If an unmasked SIMD floating-point exception and CR4.0SXMMEXCPT[bit 10] = 1. 
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2.4.4 Exceptions Type 4 (>=16 Byte mem arg no alignment, no floating-point exceptions) 


Tab 

e 2-21. Type 4 Class Exception Conditions 

Exception 

fD 

Q) 

(T 

\/irtual-8086 

Protected and 

Compatibility 

64-bit 

Cause of Exception 

Invalid Opcode, #UD 

X 

X 



VEX prefix. 



X 

X 

VEX prefix: 

If XCR0[2:1]?t'11b'. 

If CR4.0SXSAVE[bit 18]=0. 

X 

X 

X 

X 

Legacy SSE instruction: 
lfCR0.EM[bit 2]= 1. 

If CR4.0SFXSR[bit 9] = 0. 

X 

X 

X 

X 

If preceded by a LOCK prefix (FOH). 



X 

X 

If any REX, F2, F3, or 66 prefixes precede a VEX prefix. 

X 

X 

X 

X 

If any corresponding CPUID feature flag is 'O'. 

Device Not Available, 
#NM 

X 

X 

X 

X 

If CR0.TS[bit 3]=1. 

Stack, SS(0) 



X 


For an illegal address in the SS segment. 




X 

If a memory address referencing the SS segment is in a non-canonical form. 

General Protection, 
#GP(0) 

X 

X 

X 

X 

Legacy SSE: Memory operand is not 16-byte aligned.^ 



X 


For an illegal memory operand effective address in the CS, DS, ES, FS or GS seg¬ 
ments. 




X 

If the memory address is in a non-canonical form. 

X 

X 



If any part of the operand lies outside the effective address space from 0 to 

FFFFH. 

Page Fault 
#PF(fault-code) 


X 

X 

X 

For a page fault. 


NOTES: 

1. PCMPESTRI, PCMPESTRM, PCMPISTRI, PCMPISTRM and LDDQU Instructions do not cause #GP if the memory operand is not aligned to 
16-Byte boundary. 
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2.4.5 Exceptions Type 5 (< 16 Byte mem arg and no FP exceptions) 


Tab 

e 2-22. Type 5 Class Exception Conditions 

Exception 

to 

Q) 

OC 

\/irtual-8086 

Protected and 

Compatibility 

64-bit 

Cause of Exception 

Invalid Opcode, #UD 

X 

X 



VEX prefix. 



X 

X 

VEX prefix: 

If XCR0[2:1]?t'11b'. 

If CR4.0SXSAVE[bit 18]=0. 

X 

X 

X 

X 

Legacy SSE instruction: 
lfCR0.EM[bit 2] = 1. 

If CR4.0SFXSR[bit 9] = 0. 

X 

X 

X 

X 

If preceded by a LOCK prefix (FOFI). 



X 

X 

If any REX, F2, F3, or 66 prefixes precede a VEX prefix. 

X 

X 

X 

X 

If any corresponding CPUID feature flag is 'O'. 

Device Not Available, 
#NM 

X 

X 

X 

X 

If CR0.TS[bit 3]=1. 

Stack, SS(0) 



X 


For an illegal address in the SS segment. 




X 

If a memory address referencing the SS segment is in a non-canonical form. 

General Protection, 
#GP(0) 



X 


For an illegal memory operand effective address in the CS, DS, ES, FS or GS seg¬ 
ments. 




X 

If the memory address is in a non-canonical form. 

X 

X 



If any part of the operand lies outside the effective address space from 0 to 

FFFFH. 

Page Fault 
#PF(fault-code) 


X 

X 

X 

For a page fault. 

Alignment Check 
#AC(0) 


X 

X 

X 

If alignment checking is enabled and an unaligned memory reference is made 
while the current privilege level is 3. 
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2.4.6 Exceptions Type 6 (VEX-Encoded Instructions Without Legacy SSE Analogues) 

Note: At present, the AVX instructions in this category do not generate floating-point exceptions. 


Tab 

e 2-23. Type 6 Class Exception Conditions 

Exception 

fO 

0) 

a 

Virtual-SOSG 

Protected and 

Compatibility 

64-bit 

Cause of Exception 

Invalid Opcode, #UD 

X 

X 



VEX prefix. 



X 

X 

If XCR0[2:1]?t'11b'. 

If CR4.0SXSAVE[blt 18]=0. 



X 

X 

If preceded by a LOCK prefix (FOH). 



X 

X 

If any REX, F2, F3, or 66 prefixes precede a VEX prefix. 



X 

X 

If any corresponding CPUID feature flag is 'O'. 

Device Not Available, 
#NM 



X 

X 

If CRO.TS[bit 3]=1. 

Stack, SS(0) 



X 


For an illegal address In the SS segment. 




X 

If a memory address referencing the SS segment is in a non-canonical form. 

General Protection, 
#GP(0) 



X 


For an illegal memory operand effective address in the CS, DS, ES, FS or GS seg¬ 
ments. 




X 

If the memory address is in a non-canonical form. 

Page Fault 
#PF(fault-code) 



X 

X 

For a page fault. 

Alignment Check 
#AC(0) 



X 

X 

For 4 or 8 byte memory references if alignment checking is enabled and an 
unaligned memory reference Is made while the current privilege level is 3. 
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2.4.7 Exceptions Type 7 (No FP exceptions, no memory arg) 


Tab 

e 2-24. Type 7 Class Exception Conditions 

Exception 

to 

Q) 

OC 

iD 

00 

O 

Op 

"m 

■> 

Protected and 

Compatibility 

64-bit 

Cause of Exception 

Invalid Opcode, #UD 

X 

X 



VEX prefix. 



X 

X 

VEX prefix: 

If XCR0[2:1]?t'11b'. 

If CR4.0SXSAVE[bit 18]=0. 

X 

X 

X 

X 

Legacy SSE instruction: 
lfCR0.EM[bit 2] = 1. 

If CR4.0SFXSR[bit 9] = 0. 

X 

X 

X 

X 

If preceded by a LOCK prefix (FOFI). 



X 

X 

If any REX, F2, F3, or 66 prefixes precede a VEX prefix. 

X 

X 

X 

X 

If any corresponding CPUID feature flag is 'O'. 

Device Not Available, 
#NM 

1 

1 

1 

X 

X 

1 

If CR0.TS[bit 3]=1. 


2.4.8 Exceptions Type 8 (AVX and no memory argument) 


Table Z-Z5. Type 8 Class Exception Conditions 


Exception 

fO 

Q) 

Of 

Virtual-8086 

Protected and 
Compatibility 

64-bit 

Cause of Exception 

Invalid Opcode, #UD 

X 

X 



Always in Real or Virtual-8086 mode. 



X 

X 

If XCR0[2:1]?t'11b'. 

If CR4.0SXSAVE[bit 18]=0. 

If CPUID.01 H.ECX.AVX[bit 28]=0. 

If VEX.vvvv 11118. 

X 

X 1 X 

X 

If proceeded by a LOCK prefix (FOH). 

Device Not Available, 
#NM 


X 

X 

If CR0.TS[bit 3]=1. 
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2.4.9 Exception Type 11 (VEX-only, mem arg no AC, floating-point exceptions) 


Table 2-26. Type 11 Class Exception Conditions 


Exception 

ru 

0) 

\/irtual-8086 

Protected and 

Compatibility 

64-bit 

Cause of Exception 

Invalid Opcode, #UD 

X 

X 



VEX prefix. 



X 

X 

VEX prefix: 

If XCR0[2:1]?t'11b'. 

If CR4.0SXSAVE[bit 18]=0. 

X 

X 

X 

X 

If preceded by a LOCK prefix (FOFI). 



X 

X 

If any REX, F2, F3, or 66 prefixes precede a VEX prefix. 

X 

X 

X 

X 

If any corresponding CPUID feature flag is 'O'. 

Device Not Avail¬ 
able, #NM 

X 

X 

X 

X 

If CRO.TS[bit 3]=1. 

Stack, SS(0) 



X 


For an illegal address in the SS segment. 




X 

If a memory address referencing the SS segment is in a non-canonical form. 

General Protection, 
#GP(0) 



X 


For an illegal memory operand effective address in the CS, OS, ES, FS or GS seg¬ 
ments. 




X 

If the memory address is in a non-canonical form. 

X 

X 



If any part of the operand lies outside the effective address space from 0 to 
FFFFH. 

Page Fault #PF 
(fault-code) 


X 

X 

X 

For a page fault. 

SIMD Floating-Point 
Exception, #XM 

X 

X 

X 

X 

If an unmasked SIMD floating-point exception and CR4.0SXMMEXCPT[bit 10] = 1. 
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2.4.10 Exception Type 12 (VEX-only, VSIB mem arg, no AC, no floating-point exceptions) 

Table 2-27. Type 12 Class Exception Conditions 


Exception 

fO 

a> 

cc 

Virtual-8086 

Protected and 

Compatibility 

64-bit 

Cause of Exception 

Invalid Opcode, #UD 

X 

X 



VEX prefix. 



X 

X 

VEX prefix: 
lfXCR0[2:1]?t'11b'. 

If CR4.0SXSAVE[bit 18]=0. 

X 

X 

X 

X 

If preceded by a LOCK prefix (FOFI). 



X 

X 

If any REX, F2, F3, or 66 prefixes precede a VEX prefix. 

X 

X 

X 

NA 

If address size attribute is 16 bit. 

X 

X 

X 

X 

If ModR/M.mod = '11 b'. 

X 

X 

X 

X 

If ModR/M.rm '100b'. 

X 

X 

X 

X 

If any corresponding CPUID feature flag is 'O'. 

X 

X 

X 

X 

If any vector register is used more than once between the destination register, 
mask register and the index register in VSIB addressing. 

Device Not Available, 
#NM 

X 

X 

X 

X 

If CR0.TS[bit 3]=1. 

Stack, SS(0) 



X 


For an illegal address in the SS segment. 




X 

If a memory address referencing the SS segment is in a non-canonical form. 

General Protection, 
#GP(0) 



X 


For an illegal memory operand effective address in the CS, DS, ES, FS or GS seg¬ 
ments. 




X 

If the memory address is in a non-canonical form. 

X 

X 



If any part of the operand lies outside the effective address space from 0 to 

FFFFH. 

Page Fault #PF (fault- 
code) 


X 

X 

X 

For a page fault. 


2.5 VEX ENCODING SUPPORT FOR GPR INSTRUCTIONS 

VEX prefix may be used to encode instructions that operate on neither VMM nor XMM registers. VEX-encoded 

general-purpose-register instructions have the following properties: 

• Instruction syntax support for three encodable operands. 

• Encoding support for instruction syntax of non-destructive source operand, destination operand encoded via 
VEX.vvvv, and destructive three-operand syntax. 

• Elimination of escape opcode byte (OFH), two-byte escape via a compact bit field representation within the VEX 
prefix. 

• Elimination of the need to use REX prefix to encode the extended half of general-purpose register sets (R8-R15) 
for direct register access or memory addressing. 

• Flexible and more compact bit fields are provided in the VEX prefix to retain the full functionality provided by 
REX prefix. REX.W, REX.X, REX.B functionalities are provided in the three-byte VEX prefix only. 

• VEX-encoded GPR instructions are encoded with VEX.L=0. 
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Any VEX-encoded GPR instruction with a 66H, F2H, or F3H prefix preceding VEX will #UD. 
Any VEX-encoded GPR instruction with a REX prefix proceeding VEX will #UD. 
VEX-encoded GPR instructions are not supported in real and virtual 8086 modes. 


2.5.1 Exception Conditions for VEX-Encoded GPR Instructions 

The exception conditions applicable to VEX-encoded GPR instruction differs from those of legacy GPR instructions. 
Table 2-28 lists VEX-encoded GPR instructions. The exception conditions for VEX-encoded GRP instructions are 
found in Table 2-29 for those instructions which have a default operand size of 32 bits and 16-bit operand size is 
not encodable. 


Table 2-28. VEX-Encoded GPR Instructions 


Exception Class 

Instruction 

See Table 2-29 

ANDN, BLSI, BLSMSK, BLSR, BZHI, MULX, PDEP, PEXT, RORX, SARX, SHLX, SHRX 


(*) - Additional exception restrictions are present - see the Instruction description for details. 

Table 2-29. Exception Definition (VEX-Encoded GPR Instructions) 


Exception 

m 

0) 

\/irtual-8086 

Protected and 
Compatibility 

64-bit 

Cause of Exception 

Invalid Opcode, #UD 

X 

X 

X 

X 

If BMI1/BMI2 CPUID feature flag Is 'O'. 

X 

X 



If a VEX prefix is present. 



X 

X 

If any REX, F2, F3, or 66 prefixes precede a VEX prefix. 

Stack, SS(0) 

X 

X 

X 


For an illegal address in the SS segment. 




X 

If a memory address referencing the SS segment is in a non-canonical form. 

General Protection, 
#GP(0) 



X 


For an illegal memory operand effective address in the CS, DS, ES, FS or GS seg¬ 
ments. 

If the DS, ES, FS, or GS register is used to access memory and it contains a null 
segment selector. 




X 

If the memory address is in a non-canonical form. 

X 

X 



If any part of the operand lies outside the effective address space from 0 to 

FFFFH. 

Page Fault #PF(fault- 
code) 


X 

X 

X 

For a page fault. 

Alignment Check 
#AC(0) 


X 

X 

X 

If alignment checking is enabled and an unaligned memory reference is made 
while the current privilege level is 3. 


2.6 INTEL® AVX-512 ENCODING 

The majority of the Intel AVX-512 family of instructions (operating on 512/256/128-bit vector register operands) 
are encoded using a new prefix (called EVEX). Opmask instructions (operating on opmask register operands) are 
encoded using the VEX prefix. The EVEX prefix has some parts resembling the instruction encoding scheme using 
the VEX prefix, and many other capabilities not available with the VEX prefix. 

The significant feature differences between EVEX and VEX are summarized below. 
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• EVEX is a 4-Byte prefix (the first byte must be 62H); VEX is either a 2-Byte (C5H is the first byte) or 3-Byte 
(C4H is the first byte) prefix. 

• EVEX prefix can encode 32 vector registers (XMM/YMM/ZMM) in 64-bit mode. 

• EVEX prefix can encode an opmask register for conditional processing or selection control in EVEX-encoded 
vector instructions. Opmask instructions, whose source/destination operands are opmask registers and treat 
the content of an opmask register as a single value, are encoded using the VEX prefix. 

• EVEX memory addressing with dispS form uses a compressed dispS encoding scheme to improve the encoding 
density of the instruction byte stream. 

• EVEX prefix can encode functionality that are specific to instruction classes (e.g., packed instruction with 
"load+op" semantic can support embedded broadcast functionality, floating-point instruction with rounding 
semantic can support static rounding functionality, floating-point instruction with non-rounding arithmetic 
semantic can support "suppress all exceptions" functionality). 


2.6.1 Instruction Format and EVEX 

The placement of the EVEX prefix in an lA instruction is represented in Figure 2-10. 


# of bytes; 4 1114 1 

[Prefixes] EVEX Opcode ModR/M [SIB] 

[Disp32] 

[Immediate] 


1 

[Disp8*N] 



Figure 2-10. AVX-512 Instruction Format and the EVEX Prefix 


The EVEX prefix is a 4-byte prefix, with the first two bytes derived from unused encoding form of the 32-bit-mode- 
only BOUND instruction. The layout of the EVEX prefix is shown in Figure 2-11. The first byte must be 62H, followed 
by three payload bytes, denoted as PO, PI, and P2 individually or collectively as P[23:0] (see Figure 2-11). 
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Table 2-30. EVEX Prefix Bit Field Functional Grouping 


Notation 

Bit fieid Group 

Position 

Comment 

- 

Reserved 

P[3:2] 

Must be 0. 

- 

Fixed Value 

P[10] 

Must be 1. 

EVEX.mm 

Compressed legacy escape 

P[1:0] 

Identical to low two bits of VEX.mmmmm. 

EVEX.pp 

Compressed legacy prefix 

P[9:8] 

Identical to VEX.pp. 

EVEX.RXB 

Next-8 register specifier modifier 

P[7:5] 

Combine with ModR/M.reg, ModR/M.rm (base, index/vidx). 

EVEXR' 

Fligh-16 register specifier modifier 

P[4] 

Combine with EVEX.R and ModR/M.reg. 

EVEXX 

Fligh-16 register specifier modifier 

P[6] 

Combine with EVEX.B and ModR/M.rm, when SIB/VSIB absent. 

EVEX.vvvv 

NDS register specifier 

P[14:11] 

Same as VEX.vvvv. 

EVEXV' 

High-16 NDS/VIDX register specifier 

P[19] 

Combine with EVEX.vvvv or when VSIB present. 

EVEX.aaa 

Embedded opmask register specifier 

P[18:16] 


EVEX.W 

Osize promotion/Opcode extension 

P[15] 


EVEX.z 

Zeroing/Merging 

P[23] 


EVEX.b 

Broadcast/RC/SAE Context 

P[20] 


EVEX.L'L 

Vector length/RC 

P[22:21] 



The bit fields in P[23:0] are divided into the following functional groups (Table 2-30 provides a tabular summary): 

• Reserved bits: P[3:2] must be 0, otherwise #UD. 

• Fixed-value bit: P[10] must be 1, otherwise #UD. 

• Compressed legacy prefix/escape bytes: P[1:0] is identical to the lowest 2 bits of VEX.mmmmm; P[9:8] is 
identical to VEX.pp. 

• Operand specifier modifier bits for vector register, general purpose register, memory addressing: P[7:5] allows 
access to the next set of 8 registers beyond the low 8 registers when combined with ModR/M register specifiers. 

• Operand specifier modifier bit for vector register: P[4] (or EVEX. R') allows access to the high 16 vector register 
set when combined with P[7] and ModR/M.reg specifier; P[6] can also provide access to a high 16 vector 
register when SIB or VSIB addressing are not needed. 

• Non-destructive source/vector index operand specifier: P[19] and P[14:ll] encode the second source vector 
register operand in a non-destructive source syntax, vector index register operand can access an upper 16 
vector register using P[19]. 

• Op-mask register specifiers: P[18:16] encodes op-mask register set k0-k7 in instructions operating on vector 
registers. 

• EVEX.W: P[15] is similar to VEX.W which serves either as opcode extension bit or operand size promotion to 
64-bit in 64-bit mode. 

• Vector destination merging/zeroing: P[23] encodes the destination result behavior which either zeroes the 
masked elements or leave masked element unchanged. 

• Broadcast/Static-rounding/SAE context bit: P[20] encodes multiple functionality, which differs across different 
classes of instructions and can affect the meaning of the remaining field (EVEX.L'L). The functionality for the 
following instruction classes are: 

— Broadcasting a single element across the destination vector register: this applies to the instruction class 
with Load-i-Op semantic where one of the source operand is from memory. 

— Redirect L'L field (P[22:21]) as static rounding control for floating-point instructions with rounding 
semantic. Static rounding control overrides MXCSR.RC field and implies "Suppress all exceptions" (SAE). 

— Enable SAE for floating -point instructions with arithmetic semantic that is not rounding. 

— For instruction classes outside of the afore-mentioned three classes, setting EVEX.b will cause #UD. 
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• Vector length/rounding control specifier: P[22:21] can serve one of three options. 

— Vector length information for packed vector instructions. 

— Ignored for instructions operating on vector register content as a single data element. 

— Rounding control for floating-point instructions that have a rounding semantic and whose source and 
destination operands are all vector registers. 


2.6.2 Register Specifier Encoding and EVEX 

EVEX-encoded instruction can access 8 opmask registers, 16 general-purpose registers and 32 vector registers in 
64-bit mode (8 general-purpose registers and 8 vector registers in non-64-bit modes). EVEX-encoding can support 
instruction syntax that access up to 4 instruction operands. Normal memory addressing modes and VSIB memory 
addressing are supported with EVEX prefix encoding. The mapping of register operands used by various instruction 
syntax and memory addressing in 64-bit mode are shown in Table 2-31. Opmask register encoding is described in 
Section 2.6.3. 


Table 2-31. 32-Register Support in 64-bit Mode Using EVEX with Embedded REX Bits 



4 I 

3 

[2:0] 

Reg. Type 

Common Usages 

REG 

EVEX.R' 

REX.R 

modrm.reg 

GPR, Vector 

Destination or Source 

NDS/NDD 

EVEX.V' 

EVEX.vvvv 

GPR, Vector 

ZndSource or Destination 

RM 

EVEX.X 

EVEX.B 

modrm.r/m 

GPR, Vector 

1 St Source or Destination 

BASE 

0 

EVEX.B 

modrm.r/m 

GPR 

memory addressing 

INDEX 

0 

EVEX.X 

sib.index 

GPR 

memory addressing 

VIDX 

EVEX.V' 

EVEX.X 

sib.index 

Vector 

VSIB memory addressing 


NOTES: 

1. Not applicable for accessing general purpose registers. 


The mapping of register operands used by various instruction syntax and memory addressing in 32-bit modes are 
shown in Table 2-32. 


Table 2-32. EVEX Encoding Register Specifiers in 32-bit Mode 



[2:0] 

Reg. Type 

Common Usages 

REG 

modrm.reg 

GPR, Vector 

Destination or Source 

NDS/NDD 

EVEX.vvv 

GPR, Vector 

2nd Source or Destination 

RM 

modrm.r/m 

GPR, Vector 

1 St Source or Destination 

BASE 

modrm.r/m 

GPR 

Memory Addressing 

INDEX 

sib.index 

GPR 

Memory Addressing 

VIDX 

sib.index 

Vector 

VSIB Memory Addressing 


2.6.3 Opmask Register Encoding 

There are eight opmask registers, k0-k7. Opmask register encoding falls into two categories: 

• Opmask registers that are the source or destination operands of an instruction treating the content of opmask 
register as a scalar value, are encoded using the VEX prefix scheme. It can support up to three operands using 
standard modR/M byte's reg field and rm field and VEX.vvvv. Such a scalar opmask instruction does not support 
conditional update of the destination operand. 

• An opmask register providing conditional processing and/or conditional update of the destination register of a 
vector instruction is encoded using EVEX.aaa field (see Section 2.6.4). 
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• An opmask register serving as the destination or source operand of a vector instruction is encoded using 
standard modR/M byte's reg field and rm fields. 

Table 2-33. Opmask Register Specifier Encoding 



[2:0] 

Register Access 

Common Usages 

REG 

modrm.reg 

k0-k7 

Source 

NDS 

VEX.vvvv 

k0-k7 

2nd Source 

RM 

modrm.r/m 

kO-7 

1 St Source 

{kl} 

EVEX.aaa 

k0^-k7 

Opmask 


NOTES: 

1. Instructions that overwrite the conditional mask in opmask do not permit using kO as the embedded mask. 


2.6.4 Masking Support in EVEX 

EVEX can encode an opmask register to conditionally control per-element computational operation and updating of 
result of an instruction to the destination operand. The predicate operand is known as the opmask register. The 
EVEX.aaa field, P[18:16] of the EVEX prefix, is used to encode one out of a set of eight 64-bit architectural regis¬ 
ters. Note that from this set of 8 architectural registers, only kl through k7 can be addressed as predicate oper¬ 
ands. kO can be used as a regular source or destination but cannot be encoded as a predicate operand. 

AVX-512 instructions support two types of masking with EVEX.z bit (P[23]) controlling the type of masking: 

• Merging-masking, which is the default type of masking for EVEX-encoded vector instructions, preserves the old 
value of each element of the destination where the corresponding mask bit has a 0. It corresponds to the case 
of EVEX.z = 0. 

• Zeroing-masking, is enabled by having the EVEX.z bit set to 1. In this case, an element of the destination is set 
to 0 when the corresponding mask bit has a 0 value. 

AVX-512 Foundation instructions can be divided into the following groups: 

• Instructions which support "zeroing-masking". 

— Also allow merging-masking. 

• Instructions which require aaa = 000. 

— Do not allow any form of masking. 

• Instructions which allow merging-masking but do not allow zeroing-masking. 

— Require EVEX.z to be set to 0. 

— This group is mostly composed of instructions that write to memory. 

• Instructions which require aaa <> 000 do not allow EVEX.z to be set to 1. 

— Allow merging-masking and do not allow zeroing-masking, e.g., gather instructions. 


2.6.5 Compressed Displacement (disp8*N) Support in EVEX 

For memory addressing using disp8 form, EVEX-encoded instructions always use a compressed displacement 
scheme by multiplying disp8 in conjunction with a scaling factor N that is determined based on the vector length, 
the value of EVEX.b bit (embedded broadcast) and the input element size of the instruction. In general, the factor 
N corresponds to the number of bytes characterizing the internal memory operation of the input operand (e.g., 64 
when the accessing a full 512-bit memory vector). The scale factor N is listed in Table 2-34 and Table 2-35 below, 
where EVEX encoded instructions are classified using the tupletype attribute. The scale factor N of each tupletype 
is listed based on the vector length (VL) and other factors affecting it. 

Table 2-34 covers EVEX-encoded instructions which has a load semantic in conjunction with additional computa¬ 
tional or data element movement operation, operating either on the full vector or half vector (due to conversion of 
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numerical precision from a wider format to narrower format). EVEX.b is supported for such instructions for data 
element sizes which are either dword or qword (see Section 2.6.11). 

EVEX-encoded instruction that are pure load/store, and "Load+op" instruction semantic that operate on data 
element size less then dword do not support broadcasting using EVEX.b. These are listed in Table 2-35. Table 2-35 
also includes many broadcast instructions which perform broadcast using a subset of data elements without using 
EVEX.b. These instructions and a few data element size conversion instruction are covered in Table 2-35. Instruc¬ 
tion classified in Table 2-35 do not use EVEX.b and EVEX.b must be 0, otherwise #UD will occur. 

The tupletype abbreviation will be referenced in the instruction operand encoding table in the reference page of 
each instruction, providing the cross reference for the scaling factor N to encoding memory addressing operand. 
Note that the disp8*N rules still apply when using 16b addressing. 


Table 2-34. Compressed Displacement (DISP8*N) Affected by Embedded Broadcast 


TupleType 

EVEX.b 

InputSize 

EVEX.W 

Broadcast 

N(VL=128) 

N (VL=256) 

N(VL= 512) 

Comment 

Full Vector 
(FV) 

0 

32bit 

0 

none 

16 

32 

64 

Load+Op (Full Vector 
Dword/Qword) 

1 

32bit 

0 

[1 tox] 

4 

4 

4 

0 

64bit 

1 

none 

16 

32 

64 

1 

64bit 

1 

[1 tox} 

8 

8 

8 

Half Vector 
(HV) 

0 

32bit 

0 

none 

8 

16 

32 

Load+Op (Half Vector) 

1 

32bit 

0 

[1 tox} 

4 

4 

4 


Table 2-35. EVEX DISP8*N for Instructions Not Affected by Embedded Broadcast 


TupleType 

InputSize 

EVEX.W 

N {VL= 128) 

N (VL= 256) 

N(VL=512) 

Comment 

Full Vector Mem (FVM) 

N/A 

N/A 

16 

32 

64 

Load/store or subDword full vector 

Tuplel Scalar (T1S) 

8 bit 

N/A 

1 

1 

1 

1 Tuple less than Full Vector 

16bit 

N/A 

2 

2 

2 

32bit 

0 

4 

4 

4 

64blt 

1 

8 

8 

8 

Tuplel Fixed (T1F) 

32bit 

N/A 

4 

4 

4 

1 Tuple memsize not affected by 
EVEX.W 

64bit 

N/A 

8 

8 

8 

TupleZ (T2) 

32blt 

0 

8 

8 

8 

Broadcast (2 elements) 

64bit 

1 

NA 

16 

16 

Tuple4 (T4) 

32bit 

0 

NA 

16 

16 

Broadcast (4 elements) 

64bit 

1 

NA 

NA 

32 

Tuples (T8) 

32bit 

0 

NA 

NA 

32 

Broadcast (8 elements) 

Half Mem (HVM) 

N/A 

N/A 

8 

16 

32 

SubQword Conversion 

QuarterMem (QVM) 

N/A 

N/A 

4 

8 

16 

SubDword Conversion 

OctMem (OVM) 

N/A 

N/A 

2 

4 

8 

SubWord Conversion 

Mem 128 (Ml 28) 

N/A 

N/A 

16 

16 

16 

Shift count from memory 

MOVDDUP (DUP) 

N/A 

N/A 

8 

32 

64 

VMOVDDUP 


2.6.6 EVEX Encoding of Broadcast/Rounding/SAE Support 

EVEX.b can provide three types of encoding context, depending on the instruction classes: 
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• Embedded broadcasting of one data element from a source memory operand to the destination for vector 
instructions with "load+op" semantic. 

• Static rounding control overriding MXCSR.RC for floating-point instructions with rounding semantic. 

• "Suppress All exceptions" (SAE) overriding MXCSR mask control for floating-point arithmetic instructions that 
do not have rounding semantic. 


2.6.7 Embedded Broadcast Support in EVEX 

EVEX encodes an embedded broadcast functionality that is supported on many vector instructions with 32-bit 
(double word or single-precision floating-point) and 64-bit data elements, and when the source operand is from 
memory. EVEX.b (P[20]) bit is used to enable broadcast on load-op instructions. When enabled, only one element 
is loaded from memory and broadcasted to all other elements instead of loading the full memory size. 

The following instruction classes do not support embedded broadcasting: 

• Instructions with only one scalar result is written to the vector destination. 

• Instructions with explicit broadcast functionality provided by its opcode. 

• Instruction semantic is a pure load or a pure store operation. 


2.6.8 Static Rounding Support in EVEX 

static rounding control embedded in the EVEX encoding system applies only to register-to-register flavor of 
floating-point instructions with rounding semantic at two distinct vector lengths: (i) scalar, (ii) 512-bit. In both 
cases, the field EVEX.L'L expresses rounding mode control overriding MXCSR.RC if EVEX.b is set. When EVEX.b is 
set, "suppress all exceptions" is implied. The processor behave as if all MXCSR masking controls are set. 


2.6.9 SAE Support in EVEX 

The EVEX encoding system allows arithmetic floating-point instructions without rounding semantic to be encoded 
with the SAE attribute. This capability applies to scalar and 512-bit vector lengths, register-to-register only, by 
setting EVEX.b. When EVEX.b is set, "suppress all exceptions" is implied. The processor behaves as if all MXCSR 
masking controls are set. 


2.6.10 Vector Length Orthogonality 

The architecture of EVEX encoding scheme can support SIMD instructions operating at multiple vector lengths. 
Many AVX-512 Foundation instructions operate at 512-bit vector length. The vector length of EVEX encoded vector 
instructions are generally determined using the L'L field in EVEX prefix, except for 512-bit floating-point, reg-reg 
instructions with rounding semantic. The table below shows the vector length corresponding to various values of 
the L'L bits. When EVEX is used to encode scalar instructions, L'L is generally ignored. 

When EVEX.b bit is set for a register-register instructions with floating-point rounding semantic, the same two bits 
P2[6:5] specifies rounding mode for the instruction, with implied SAE behavior. The mapping of different instruc¬ 
tion classes relative to the embedded broadcast/rounding/SAE control and the EVEX.L'L fields are summarized in 
Table 2-36. 


Table 2-36. EVEX Embedded Broadcast/Rounding/SAE and Vector Length on Vector Instructions 


Position 

P2[4] 

P2[6:5] 

P2[6:5] 

Broadcast/Rounding/SAE Context 

EVEX.b 

EVEX.L'L 

EVEX.RC 

Reg-reg, FP Instructions w/ rounding semantic 

Enable static rounding 
control (SAE implied) 

Vector length Implied 
(512 bit or scalar) 

00b: SAE + RNE 

01b: SAE + RD 

10b: SAE + RU 

11 b: SAE + RZ 
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Table 2-36. EVEX Embedded Broadcast/Rounding/SAE and Vector Length on Vector Instructions 


Position 

P2[4] 

P2[6:5] 

P2[6:5] 

Broadcast/Rounding/SAE Context 

EVEX.b 

EVEX.L'L 

EVEX.RC 

FP Instructions w/o rounding semantic, can cause #XF 

SAE control 

00b: 128-bit 

01b: 256-bit 

10b: 512-bit 

11 b: Reserved (#UD) 

NA 

Load+op Instructions w/ memory source 

Broadcast Control 

NA 

Other Instructions ( 

Explicit Load/Store/Broadcast/Gather/Scatter) 

Must be 0 (otherwise 
#UD) 

NA 


2.6.11 #UD Equations for EVEX 

Instructions encoded using EVEX can face three types of UD conditions: state dependent, opcode independent and 
opcode dependent. 

2.6.11.1 State Dependent #UD 

In general, attempts of execute an instruction, which required OS support for incremental extended state compo¬ 
nent, will #UD if required state components were not enabled by OS. Table 2-37 lists instruction categories with 
respect to required processor state components. Attempts to execute a given category of instructions while 
enabled states were less than the required bit vector in XCRO shown in Table 2-37 will cause #UD. 


Table 2-37. OS XSAVE Enabling Requirements of Instruction Categories 


Instruction Categories 

Vector Register State Access 

Required XCRO Bit Vector [7:0] 

Legacy SIMD prefix encoded Instructions (e.g SSE) 

XMM 

xxxxxxl1b 

VEX-encoded instructions operating on VMM 

YMM 

xxxxxl11b 

EVEX-encoded 128-bit instructions 

ZMM 

lllxxlllb 

EVEX-encoded 256-bit instructions 

ZMM 

lllxxlllb 

EVEX-encoded 512-bit instructions 

ZMM 

lllxxlllb 

VEX-encoded instructions operating on opmask 

k-reg 

xxlxxxl1b 


2.6.11.2 Opcode Independent #UD 

A number of bit fields in EVEX encoded instruction must obey mode-specific but opcode-independent patterns 
listed in Table 2-38. 


Table 2-38. Opcode Independent, State Dependent EVEX Bit Fields 


Position 

Notation 

64-bit #UD 

Non-64-bit #UD 

P[3:2] 

- 

if >0 

if >0 

P[10] 

- 

if 0 

if 0 

P[1:0] 

EVEX.mm 

if 00b 

if 00b 

P[7:6] 

EVEX.RX 

None (valid) 

None (BOUND if EVEX.RX 1= 11 b) 


2.6.11.3 Opcode Dependent #UD 

This section describes legal values for the rest of the EVEX bit fields. Table 2-39 lists the #UD conditions of EVEX 
prefix bit fields which encodes or modifies register operands. 


Table 2-39. #UD Conditions of Operand-Encoding EVEX Prefix Bit Fields 


Notation 

Position 

Operand Encoding 

64-bit #UD 

Non-64-bit #UD 
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Table 2-39. #UD Conditions of Operand-Encoding EVEX Prefix Bit Fields (Contd.) 


EVEX.R 

P[7] 

ModRM.reg encodes k-reg 

if EVEX.R = 0 

None (BOUND if 
EVEX.RX 1= 11 b) 

ModRM.reg is opcode extension 

None (ignored) 

ModRM.reg encodes all other registers 

None (valid) 

EVEX.X 

P[6] 

ModRM.r/m encodes ZMM/YMM/XMM 

None (valid) 

ModRM.r/m encodes k-reg or GPR 

None (ignored) 

ModRM.r/m without SIB/VSIB 

None (ignored) 

ModRM.r/m with SIB/VSIB 

None (valid) 

EVEX.B 

P[5] 

ModRM.r/m encodes k-reg 

None (ignored) 

None (ignored) 

ModRM.r/m encodes other registers 

None (valid) 

ModRM.r/m base present 

None (valid) 

ModRM.r/m base not present 

None (ignored) 

EVEXR' 

P[4] 

ModRM.reg encodes k-reg or GPR 

if 0 

None (ignored) 

ModRM.reg is opcode extension 

None (ignored) 

ModRM.reg encodes ZMM/YMM/XMM 

None (valid) 

EVEX.vvvv 

P[14:11] 

vvvv encodes ZMM/YMM/XMM 

None (valid) 

None (valid) 

P[14] ignored 

Otherwise 

if != 1111 b 

if != 1111b 

EVEXV' 

P[19] 

Encodes ZMM/YMM/XMM 

None (valid) 

if 0 

Otherwise 

if 0 

if 0 


Table 2-40 lists the #UD conditions of instruction encoding of opmask register using EVEX.aaa and EVEX.z 

Table 2-40. #UD Conditions of Opmask Related Encoding Field 


Notation 

Position 

Operand Encoding 

64-bit #UD 

Non-64-bit #UD 

EVEX.aaa 

P[18:16] 

Instructions do not use opmask for conditional processing^ 

if aaa 1= 000b 

if aaa 1= 000b 

Opmask used as conditional processing mask and updated 
at completion^. 

if aaa = 000b 

if aaa = 000b; 

Opmask used as conditional processing. 

None (valid^) 

None (valid^) 

EVEX.z 

P[23] 

Vector Instruction using opmask as source or destination'^. 

if EVEX.z != 0 

if EVEX.z != 0 

Store Instructions or gather/scatter Instructions. 

if EVEX.z != 0 

if EVEX.z != 0 

Instruction supporting conditional processing mask with 
EVEX.aaa = 000b. 

if EVEX.z 1= 0 

if EVEX.z != 0 


NOTES: 

1. E.g., VBROADCASTMxxx, VPMOVMZx, VPMOVxZM. 

2. E.g., Gather/Scatter family. 

3. aaa can take any value. A value of 000 Indicates that there is no masking on the instruction; in this case, all elements will be pro¬ 
cessed as if there was a mask of 'all ones' regardless of the actual value in KO. 

4. E.g., VFPCLASSPD/PS, VCMPB/D/Q/W family, VPMOVMZx, VPMOVxZM. 


Table 2-41 lists the #UD conditions of EVEX bit fields that depends on the context of EVEX.b. 

Table 2-41. #UD Conditions Dependent on EVEX.b Context 


Notation 

Position 

Operand Encoding 

64-bit #UD 

Non-64-bit #UD 
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Table 2-41. #UD Conditions Dependent on EVEX.b Context (Contd.) 


EVEX.L'Lb 

P[22:20] 

Reg-reg, FP instructions with rounding semantic. 

None (valid^) 

None (valid^) 



Other reg-reg, FP instructions that can cause #XF. 

None (valid^) 

None (valid^) 



Other reg-mem instructions in Table 2-34. 

None (valid^) 

None (valid^) 



Other instruction classes^ in Table 2-35. 

If EVEX.b > 0 

If EVEX.b > 0 


NOTES: 

1. L'L specifies rounding control, see Table 2-36, supports [er] syntax. 

2. L'L specifies vector length, see Table 2-36, supports [sae} syntax. 

3. L'L specifies vector length, see Table 2-36, supports embedded broadcast syntax 

4. L'L specifies either vector length or ignored. 

2.6.12 Device Not Available 

EVEX-encoded instructions follow the same rules when it comes to generating #NM (Device Not Available) excep¬ 
tion. In particular, it is generated when CR0.TS[bit 3]= 1. 


2.6.13 Scalar Instructions 

EVEX-encoded scalar SIMD instructions can access up to 32 registers in 64-bit mode. Scalar instructions support 
masking (using the least significant bit of the opmask register), but broadcasting is not supported. 


2.7 EXCEPTION CLASSIFICATIONS OF EVEX-ENCODED INSTRUCTIONS 

The exception behavior of EVEX-encoded instructions can be classified into the classes shown in the rest of this 
section. The classification of EVEX-encoded instructions follow a similar framework as those of AVX and AVX2 
instructions using the VEX prefix. Exception types for EVEX-encoded instructions are named in the style of 
"E##" or with a suffix "E##XX". The "##" designation generally follows that of AVX/AVX2 instructions. The 
majority of EVEX encoded instruction with "Load-i-op" semantic supports memory fault suppression, which is repre¬ 
sented by E##. The instructions with "Load-i-op" semantic but do not support fault suppression are named 
"E##NF". A summary table of exception classes by class names are shown below. 


Table 2-42. EVEX-Encoded Instruction Exception Class Summary 


Exception Class 

Instruction set 

Mem arg 

(#XM) 

Type El 

Vector Moves/Load/Stores 

Explicitly aligned, w/ fault suppression 

None 

Type El NF 

Vector Non-temporal Stores 

Explicitly aligned, no fault suppression 

None 

Type E2 

FP Vector Load+op 

Support fault suppression 

Yes 

Type E2NF 

FP Vector Load+op 

No fault suppression 

Yes 

Type E3 

FP Scalar/Partlal Vector, Load+Op 

Support fault suppression 

Yes 

Type E3NF 

FP Scalar/Partlal Vector, Load+Op 

No fault suppression 

Yes 

Type E4 

Integer Vector Load+op 

Support fault suppression 

No 

Type E4NF 

Integer Vector Load+op 

No fault suppression 

No 

Type E5 

Legacy-llke Promotion 

Varies, Support fault suppression 

No 

Type E5NF 

Legacy-llke Promotion 

Varies, No fault suppression 

No 

Type E6 

Post AVX Promotion 

Varies, w/ fault suppression 

No 

Type E6NF 

Post AVX Promotion 

Varies, no fault suppression 

No 
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Table 2-42. EVEX-Encoded Instruction Exception Class Summary 


Exception Class 

Instruction set 

Mem arg 

(#XM) 

Type E7NM 

Register-to-register op 

None 

None 

Type E9NF 

Miscellaneous 128-bit 

Vector-length Specific, no fault suppression 

None 

Type E10 

Non-XF Scalar 

Vector Length ignored, w/ fault suppression 

None 

TypeEIONF 

Non-XF Scalar 

Vector Length ignored, no fault suppression 

None 

Type El 1 

VCVTPH2PS 

Flalf Vector Length, w/ fault suppression 

Yes 

Type El INF 

VCVTPS2PH 

Flalf Vector Length, no fault suppression 

Yes 

Type El 2 

Gather and Scatter Family 

VSIB addressing, w/ fault suppression 

None 

TypeEIZNP 

Gather and Scatter Prefetch Family 

VSIB addressing, w/o page fault 

None 


Table 2-43 lists EVEX-encoded instruction mnemonic by exception classes. 


Table 2-43. EVEX Instructions in each Exception Class 


Exception Class 

Instruction 

Type El 

VMOVAPD, VMOVAPS, VMOVDQA32, VMOVDQA64 

Type El NF 

VMOVNTDQ, VMOVNTDQA, VMOVNTPD, VMOVNTPS 

Type E2 

VADDPD, VADDPS, VCMPPD, VCMPPS, VCVTDQ2PS, VCVTPD2DQ, VCVTPD2PS, VCVTPS2DQ, VCVTTPD2DQ, 
VCVTTPS2DQ, VDIVPD, VDIVPS, VFMADDxxxPD, VFMADDxxxPS, VFMSUBADDxxxPD, VFMSUBADDxxxPS, 
VFMSUBxxxPD, VFMSUBxxxPS, VFNMADDxxxPD, VFNMADDxxxPS, VFNMSUBxxxPD, VFNMSUBxxxPS, VMAXPD, 
VMAXPS, VMINPD, VMINPS, VMULPD, VMULPS, VSQRTPD, VSQRTPS, VSUBPD, VSUBPS 

VCVTPD2QQ, VCVTPD2UQQ, VCVTPD2UDQ, VCVTPS2UDQS, VCVTQQ2PD, VCVTQQ2PS, VCVTTPD2DQ, 
VCVTTPD2QQ, VCVTTPD2UDQ, VCVTTPD2UQQ, VCVTTPS2DQ, VCVTTPS2UDQ, VCVTUDQ2PS, VCVTUQQ2PD, 
VCVTUQQ2PS, VFIXUPIMMPD, VFIXUPIMMPS, VGETEXPPD, VGETEXPPS, VGETMANTPD, VGETMANTPS, VRANGEPD, 
VRANGEPS, VREDUCEPD, VREDUCEPS, VRNDSCALEPD, VRNDSCALEPS, VSCALEFPD, VSCALEFPS, VRCP28PD, 
VRCP28PS, VRSQRT28PD, VRSQRT28PS 

Type E3 

VADDSD, VADDSS, VCMPSD, VCMPSS, VCVTPS2PD, VCVTSD2SS, VCVTSS2SD, VDIVSD, VDIVSS, VMAXSD, VMAXSS, 
VMINSD, VMINSS, VMULSD, VMULSS, VSQRTSD, VSQRTSS, VSUBSD, VSUBSS 

VCVTPS2QQ, VCVTPS2UQQ, VCVTTPS2QQ, VCVTTPS2UQQ, VFMADDxxxSD, VFMADDxxxSS, VFMSUBxxxSD, 
VFMSUBxxxSS, VFNMADDxxxSD, VFNMADDxxxSS, VFNMSUBxxxSD, VFNMSUBxxxSS, VFIXUPIMMSD, 
VFIXUPIMMSS, VGETEXPSD, VGETEXPSS, VGETMANTSD, VGETMANTSS, VRANGESD, VRANGESS, VREDUCESD, 
VREDUCESS, VRNDSCALESD, VRNDSCALESS, VSCALEFSD, VSCALEFSS, VRCP28SD, VRCP28SS, VRSQRT28SD, 
VRSQRT28SS 

Type E3NF 

VCOMISD, VCOMISS, VCVTSD2SI, VCVTSI2SD, VCVTSI2SS, VCVTSS2SI, VCVTTSD2SI, VCVTTSS2SI, VUCOMISD, 
VUCOMISS 

VCVTSD2USI, VCVTTSD2USI, VCVTSS2USI, VCVTTSS2USI, VCVTUSI2SD, VCVTUSI2SS 
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Table 2-43. EVEX Instructions in each Exception Class (Contd.) 

Exception Class 

Instruction 

Type E4 

VANDPD, VANDPS, VANDNPD, VANDNPS, VORPD, VORPS, VPABSD, VPABSQ, VPADDD, VPADDQ, VPANDD, VPANDQ, 
VPANDND, VPANDNQ, VPCMPEQD, VPCMPEQQ, VPCMPGTD, VPCMPGTQ, VPMAXSD, VPMAXSQ, VPMAXUD, 
VPMAXUQ, VPMINSD, VPMINSQ, VPMINUD, VPMINUQ, VPMULLD, VPMULLQ, VPMULUDQ, VPMULDQ, VPORD, 

VPORQ, VPSUBD, VPSUBQ, VPXORD, VPXORQ, VXORPD, VXORPS, VPSLLVD, VPSLLVQ, 

VBLENDMPD, VBLENDMPS, VPBLENDMD, VPBLENDMQ, VFPCLASSPD, VFPCLASSPS, VPCMPD, VPCMPQ, VPCMPUD, 
VPCMPUQ, VPLZCNTD, VPLZCNTQ, VPROLD, VPROLQ, (VPSLLD, VPSLLQ, VPSRAD, VPSRAQ, VPSRLD, VPSRLQ)\ 
VPTERNLOGD, VPTERNLOGQ, VPTESTMD, VPTESTMQ, VPTESTNMD, VPTESTNMQ, VRCP14PD, VRCP14PS, 

VRSQRT14PD, VRSQRT14PS, VPCONFLICTD, VPCONFLICTQ, VPSRAVW, VPSRAVD, VPSRAVW, VPSRAVQ, 
VPMADD52LUQ, VPMADD52HUQ 

E4.nb2 

VMOVUPD, VMOVUPS, VM0VDQU8, VMOVDQU16, VMOVDQU32, VM0VDQU64, VPCMPB, VPCMPW, VPCMPUB, 
VPCMPUW, VEXPANDPD, VEXPANDPS, VPCOMPRESSD, VPCOMPRESSQ, VPEXPANDD, VPEXPANDQ, 

VCOMPRESSPD, VCOMPRESSPS, VPABSB, VPABSW, VPADDB, VPADDW, VPADDSB, VPADDSW, VPADDUSB, 
VPADDUSW, VPAVGB, VPAVGW, VPCMPEQB, VPCMPEQW, VPCMPGTB, VPCMPGTW, VPMAXSB, VPMAXSW, 

VPMAXUB, VPMAXUW, VPMINSB, VPMINSW, VPMINUB, VPMINUW, VPMULHRSW, VPMULHUW, VPMULHW, 
VPMULLW, VPSUBB, VPSUBW, VPSUBSB, VPSUBSW, VPTESTMB, VPTESTMW, VPTESTNMB, VPTESTNMW, VPSLLW, 
VPSRAW, VPSRLW, VPSLLVW, VPSRLVW 

Type E4NF 

VPACKSSDW, VPACKUSDW VPSHUFD, VPUNPCKHDQ, VPUNPCKHQDQ, VPUNPCKLDQ, VPUNPCKLQDQ, VSHUFPD, 
VSHUFPS, VUNPCKHPD, VUNPCKHPS, VUNPCKLPD, VUNPCKLPS, VPERMD, VPERMPS, VPERMPD, VPERMQ, 

VALIGND, VALIGNQ, VPERMI2D, VPERMI2PS, VPERMI2PD, VPERMI2Q, VPERMT2D, VPERMT2PS, VPERMT2Q, 
VPERMT2PD, VPERMILPD, VPERMILPS, VSHUFI32X4, VSHUFI64X2, VSHUFF32X4, VSHUFF64X2, 

VPMULTISHIFTQB 

E4NF.nb2 

VDBPSADBW, VPACKSSWB, VPACKUSWB, VPALIGNR, VPMADDWD, VPMADDUBSW, VMOVSHDUP, VMOVSLDUP, 
VPSADBW, VPSHUFB, VPSHUFHW, VPSHUFLW, VPSLLDQ, VPSRLDQ, VPSLLW, VPSRAW, VPSRLW, (VPSLLD, 

VPSLLQ, VPSRAD, VPSRAQ, VPSRLD, VPSRLQ)^, VPUNPCKHBW, VPUNPCKHWD, VPUNPCKLBW, VPUNPCKLWD, 
VPERMW, VPERMI2W, VPERMT2W, VPERMB, VPERMI2B, VPERMT2B 

Type E5 

VCVTDQ2PD, PMQVSXBW, PMDVSXBW, PMQVSXBD, PMOVSXBQ, PMQVSXWD, PMOVSXWQ, PMQVSXDQ, 
PMDVZXBW, PMDVZXBD, PMQVZXBQ, PMDVZXWD, PMQVZXWQ, PMOVZXDQ 

VCVTUDQ2PD 

Type E5NF 

VMQVDDUP 

Type E6 

VBRQADCASTSS, VBRDADCASTSD, VBRDADCASTF32X4, VBRQADCASTI32X4, VPBROADCASTB, VPBRQADCASTD, 
VPBRQADCASTW, VPBRQADCASTQ, 

VBRDADCASTF32X2, VBROADCASTF32X4, VBROADCASTF64X2, VBRQADCASTF32X8, VBR0ADCASTF64X4, 
VBRDADCASTI32X2, VBRDADCASTI32X4, VBRDADCASTI64X2, VBRDADCASTI32X8, VBRQADCASTI64X4, 
VFPCLASSSD, VFPCLASSSS, VPMDVQB, VPMOVSQB, VPMQVUSQB, VPMQVQW, VPMOVSQW, VPMQVUSQW, 
VPMOVQD, VPMQVSQD, VPMQVUSQD, VPMDVDB, VPMDVSDB, VPMOVUSDB, VPMDVDW, VPMOVSDW, 

VPMOVUSDW 

Type E6NF 

VEXTRACTF32X4, VEXTRACTF64X2, VEXTRACTF32X8, VINSERTF32X4, VINSERTF64X2, VINSERTF64X4, 
VINSERTF32X8, VINSERTI32X4, VINSERTI64X2, VINSERTI64X4, VINSERTI32X8, VEXTRACTI32X4, 
VEXTRACTI64X2, VEXTRACTI32X8, VEXTRACTI64X4, VPBRQADCASTMB2Q, VPBRQADCASTMW2D, VPMOVWB, 
VPMOVSWB, VPMDVUSWB 

Type 

ETNM.IZS'^ 

VMQVLHPS, VMDVHLPS 

Type E7NM. 

(VPBRQADCASTD, VPBRQADCASTQ, VPBRQADCASTB, VPBRQADCASTW)^, VPMQVM2B, VPMOVM2D, VPMDVM2Q, 
VPMOVM2W, VPMDVB2M, VPMOVD2M, VPMDVQ2M, VPMOVW2M 
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Table 2-43. EVEX Instructions in each Exception Class (Contd.) 


Exception Class 

Instruction 

Type E9NF 

VEXTRACTPS, VINSERTPS, VMOVHPD, VMOVHPS, VMOVLPD, VMOVLPS, VMOVD, VMOVQ, VPEXTRB, VPEXTRD, 
VPEXTRW, VPEXTRQ, VPINSRB, VPINSRD, VPINSRW, VPINSRQ 

Type E10 

VMOVSD, VMOVSS, VRCP14SD, VRCP14SS, VRSQRT14SD, VRSQRT14SS, 

TypeEIONF 

(VCVTSI2SD, VCVTUSI2SD)^ 

Type El 1 

VCVTPH2PS, VCVTPS2PH 

Type El 2 

VGATHERDPS, VGATHERDPD, VGATHERQPS, VGATHERQPD, VPGATHERDD, VPGATHERDQ, VPGATHERQD, 
VPGATHERQQ, VPSCATTERDD, VPSCATTERDQ, VPSCATTERQD, VPSCATTERQQ, VSCATTERDPD,VSCATTERDPS, 
VSCATTERQPD,VSCATTERQPS 

TypeE12NP 

VGATHERPFODPD, VGATHERPFODPS, VGATHERPFOQPD, VGATHERPFOQPS, VGATHERPF1DPD, VGATHERPF1DPS, 
VGATHERPF1QPD, VGATHERPF1QPS, VSCATTERPFODPD, VSCATTERPFODPS, VSCATTERPFOQPD, 
VSCATTERPFOQPS, VSCATTERPF1 DPD, VSCATTERPF1DPS, VSCATTERPF1 QPD, VSCATTERPF1 QPS 


NOTES: 

1. Operand encoding FVI tupletype with Immediate. 

2. Embedded broadcast Is not supported with the ".nb" suffix. 

3. Operand encoding Ml 28 tupletype. 

4. #UD raised if EVEX.L'L l=00b (VL=128). 

5. The source operand is a general purpose register. 

6. WO encoding only. 
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2.7.1 Exceptions Type El and El NF of EVEX-Encoded Instructions 

EVEX-encoded instructions with memory alignment restrictions, and supporting memory fault suppression follow 
exception class El. 


Table 2-44. Type E1 Class Exception Conditions 


Exception 

ro 

O 

(T 

Virtual 80x86 

Protected and 

Compatibility 

64-bit 

Cause of Exception 

Invalid Opcode, 

#UD 

X 

X 



If EVEX prefix present. 



X 

X 

If CR4.0SXSAVE[blt 18]=0. 

If any one of following conditions applies: 

■ State reguirement. Table 2-37 not met. 

■ Opcode Independent #UD condition in Table 2-38. 

■ Operand encoding #UD conditions in Table 2-39. 

■ Opmask encoding #UD condition of Table 2-40. 

■ lfEVEX.b!=0. 

■ If EVEX.L'L!= 10b(VL=512). 

X 

X 

X 

X 

If preceded by a LOCK prefix (FOFI). 



X 

X 

If any REX, F2, F3, or 66 prefixes precede a EVEX prefix. 

X 

X 

X 

X 

If any corresponding CPUID feature flag is 'O'. 

Device Not Avail¬ 
able, #NM 

X 

X 

X 

X 

If CR0.TS[bit 3]=1. 

Stack, SS(0) 



X 


If fault suppression not set, and an illegal address in the SS segment. 




X 

If fault suppression not set, and a memory address referencing the SS segment is in 
a non-canonical form. 

General Protection, 
#CP(0) 



X 

X 

EVEX.512: Memory operand is not 64-byte aligned. 

EVEX.256: Memory operand is not 32-byte aligned. 

EVEX.128: Memory operand is not 16-byte aligned. 



X 


If fault suppression not set, and an illegal memory operand effective address in the 
CS, DS, ES, FS or GS segments. 




X 

If fault suppression not set, and the memory address is in a non-canonical form. 

X 

X 



If fault suppression not set, and any part of the operand lies outside the effective 
address space from 0 to FFFFFI. 

Page Fault 
#PF(fault-code) 


X 

X 

X 

If fault suppression not set, and a page fault. 
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EVEX-encoded instructions with memory alignment restrictions, but do not support memory fault suppression 
follow exception class EINF. 


Table 2-45. Type E1NF Class Exception Conditions 


Exception 

fD 

Q) 

(T 

Virtual 80x86 

Protected and 

Compatibility 

64-bit 

Cause of Exception 

Invalid Opcode, 

#UD 

X 

X 



If EVEX prefix present. 



X 

X 

If CR4.0SXSAVE[bit 18]=0. 

If any one of following conditions applies: 

■ State reguirement. Table 2-37 not met. 

■ Opcode independent #UD condition in Table 2-38. 

■ Operand encoding #UD conditions in Table 2-39. 

■ Opmask encoding #UD condition of Table 2-40. 

■ lfEVEX.b!=0. 

■ If EVEX.L'LI= 10b(VL=512). 

X 

X 

X 

X 

If preceded by a LOCK prefix (FOFI). 



X 

X 

If any REX, F2, F3, or 66 prefixes precede a EVEX prefix. 

X 

X 

X 

X 

If any corresponding CPUID feature flag is 'O'. 

Device Not Avail¬ 
able, #NM 

X 

X 

X 

X 

If CR0.TS[bit 3]=1. 

Stack, SS(0) 



X 


For an illegal address in the SS segment. 




X 

If a memory address referencing the SS segment is in a non-canonical form. 

General Protection, 
#GP(0) 



X 

X 

EVEX.512: Memory operand is not 64-byte aligned. 

EVEX.256: Memory operand is not 32-byte aligned. 

EVEX.128: Memory operand is not 16-byte aligned. 



X 


For an illegal memory operand effective address in the CS, DS, ES, FS or GS seg¬ 
ments. 




X 

If the memory address is in a non-canonical form. 

X 

X 



If any part of the operand lies outside the effective address space from 0 to FFFFFI. 

Page Fault 
#PF(fault-code) 


X 

X 

X 

For a page fault. 
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2.7.2 Exceptions Type E2 of EVEX-Encoded Instructions 

EVEX-encoded vector instructions with arithmetic semantic follow exception class E2. 


Table 2-46. Type E2 Class Exception Conditions 


Exception 

TO 

Q) 

Virtual 8086 

Protected and 

Compatibility 

64-bit 

Cause of Exception 

Invalid Opcode, 
#UD 

X 

X 



If EVEX prefix present. 

X 

X 

X 

X 

If an unmasked SIMD floating-point exception and CR4.0SXMMEXCPT[bit 10] = 0. 



X 

X 

If CR4.0SXSAVE[bit 18]=0. 

If any one of following conditions applies: 

■ State reguirement. Table 2-37 not met. 

■ Opcode independent #UD condition in Table 2-38. 

■ Operand encoding #UD conditions in Table 2-39. 

■ Opmask encoding #UD condition of Table 2-40. 

■ If EVEX.L'L!= 10b(VL=512). 

X 

X 

X 

X 

If preceded by a LOCK prefix (FOFI). 



X 

X 

If any REX, F2, F3, or 66 prefixes precede a EVEX prefix. 

X 

X 

X 

X 

If any corresponding CPUID feature flag is 'O'. 

Device Not Avail¬ 
able, #NM 

X 

X 

X 

X 

If CR0.TS[bit 3]=1. 

Stack, SS(0) 



X 


If fault suppression not set, and an illegal address in the SS segment. 




X 

If fault suppression not set, and a memory address referencing the SS segment is in a 
non-canonical form. 

General Protec¬ 
tion, #GP(0) 



X 


If fault suppression not set, and an illegal memory operand effective address in the CS, 
DS, ES, FS or GS segments. 




X 

If fault suppression not set, and the memory address is in a non-canonical form. 

X 

X 



If fault suppression not set, and any part of the operand lies outside the effective 
address space from 0 to FFFFH. 

Page Fault 
#PF(fault-code) 


X 

X 

X 

If fault suppression not set, and a page fault. 

SIMD Floating¬ 
point Exception, 
#XM 

X 

X 

X 

X 

If an unmasked SIMD floating-point exception, {sae} or [er] not set, and CR4.0SXMMEX- 
CPT[bit 10] = 1. 
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2.7.3 Exceptions Type E3 and E3NF of EVEX-Encoded Instructions 

EVEX-encoded scalar instructions with arithmetic semantic that support memory fault suppression follow excep¬ 
tion class E3. 


Table 2-47. Type E3 Class Exception Conditions 


Exception 

fO 

a> 

a 

Virtual 80x86 

Protected and 

Compatibility 

64-bit 

Cause of Exception 

Invalid Opcode, #UD 

X 

X 



If EVEX prefix present. 

X 

X 

X 

X 

If an unmasked SIMD floating-point exception and CR4.0SXMMEXCPT[bit 10] = 0. 



X 

X 

If CR4.0SXSAVE[bit 18]=0. 

If any one of following conditions applies: 

■ State requirement. Table 2-37 not met. 

■ Opcode independent #UD condition in Table 2-38. 

■ Operand encoding #UD conditions in Table 2-39. 

■ Opmask encoding #UD condition of Table 2-40. 

■ lfEVEX.b!=0. 

X 

X 

X 

X 

If preceded by a LOCK prefix (FOFI). 



X 

X 

If any REX, F2, F3, or 66 prefixes precede a EVEX prefix. 

X 

X 

X 

X 

If any corresponding CPUID feature flag is 'O'. 

Device Not Available, 
#NM 

X 

X 

X 

X 

If CR0.TS[bit 3]=1. 

Stack, SS(0) 



X 


If fault suppression not set, and an illegal address in the SS segment. 




X 

If fault suppression not set, and a memory address referencing the SS segment is 
in a non-canonical form. 

General Protection, 
#CP(0) 



X 


If fault suppression not set, and an illegal memory operand effective address in 
the CS, DS, ES, FS or GS segments. 




X 

If fault suppression not set, and the memory address is in a non-canonical form. 

X 

X 



If fault suppression not set, and any part of the operand lies outside the effective 
address space from 0 to FFFFH. 

Page Fault #PF(fault- 
code) 


X 

X 

X 

If fault suppression not set, and a page fault. 

Alignment Check 
#AC(0) 


X 

X 

X 

If alignment checking is enabled and an unaligned memory reference of 8 bytes 
or less is made while the current privilege level is 3. 

SIMD Floating-point 
Exception, #XM 

X 

X 

X 

X 

If an unmasked SIMD floating-point exception, {sae} or [er] not set, and CR4.0SX- 
MMEXCPT[bit 10] = 1. 
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EVEX-encoded scalar instructions with arithmetic semantic that do not support memory fault suppression follow 
exception class E3NF. 


Table 2-48. Type E3NF Class Exception Conditions 


Exception 

Real 

Virtual 80x86 

Protected and 

Compatibility 

64-bit 

Cause of Exception 

Invalid Opcode, #UD 

X 

X 



EVEX prefix. 

X 

X 

X 

X 

If an unmasked SIMD floating-point exception and CR4.0SXMMEXCPT[bit 10] = 0. 



X 

X 

If CR4.0SXSAVE[bit 18]=0. 

If any one of following conditions applies: 

■ State requirement. Table 2-37 not met. 

■ Opcode independent #UD condition in Table 2-38. 

■ Operand encoding #UD conditions in Table 2-39. 

■ Opmask encoding #UD condition of Table 2-40. 

■ lfEVEX.b!=0. 

X 

X 

X 

X 

If preceded by a LOCK prefix (FOFI). 



X 

X 

If any REX, F2, F3, or 66 prefixes precede a EVEX prefix. 

X 

X 

X 

X 

If any corresponding CPUID feature flag is 'O'. 

Device Not Available, 
#NM 

X 

X 

X 

X 

If CR0.TS[bit 3]=1. 

Stack, SS(0) 



X 


For an illegal address in the SS segment. 




X 

If a memory address referencing the SS segment is in a non-canonical form. 

General Protection, 
#GP(0) 



X 


For an illegal memory operand effective address in the CS, DS, ES, FS or GS seg¬ 
ments. 




X 

If the memory address is in a non-canonical form. 

X 

X 



If any part of the operand lies outside the effective address space from 0 to 
FFFFH. 

Page Fault #PF(fault- 
code) 


X 

X 

X 

For a page fault. 

Alignment Check 
#AC(0) 


X 

X 

X 

If alignment checking is enabled and an unaligned memory reference of 8 bytes 
or less is made while the current privilege level is 3. 

SIMD Floating-point 
Exception, #XM 

X 

X 

X 

X 

If an unmasked SIMD floating-point exception, [sae] or [er] not set, and CR4.0SX- 
MMEXCPT[bit 10] = 1. 
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2.7.4 Exceptions Type E4 and E4NF of EVEX-Encoded Instructions 

EVEX-encoded vector instructions that cause no SIMD FP exception and support memory fault suppression follow 
exception class E4. 


Table 2-49. Type E4 Class Exception Conditions 


Exception 

(D 

Q) 

OC 

Virtual 80x86 

Protected and 

Compatibility 

64-bit 

Cause of Exception 

Invalid Opcode, #UD 

X 

X 



If EVEX prefix present. 



X 

X 

If CR4.0SXSAVE[bit 18]=0. 

If any one of following conditions applies: 

■ State requirement. Table 2-37 not met. 

■ Opcode independent #UD condition in Table 2-38. 

■ Operand encoding #UD conditions in Table 2-39. 

■ Opmask encoding #UD condition of Table 2-40. 

■ If EVEX.b 1= 0 and in E4.nb subclass (see E4.nb entries in Table 2-43). 

■ If EVEX.L'L!= 10b(VL=512). 

X 

X 

X 

X 

If preceded by a LOCK prefix (FOH). 



X 

X 

If any REX, F2, F3, or 66 prefixes precede a EVEX prefix. 

X 

X 

X 

X 

If any corresponding CPUID feature flag is 'O'. 

Device Not Available, 
#NM 

X 

X 

X 

X 

If CR0.TS[bit 3]=1. 

Stack, SS(0) 



X 


If fault suppression not set, and an illegal address in the SS segment. 




X 

If fault suppression not set, and a memory address referencing the SS segment is 
in a non-canonical form. 

General Protection, 
#GP(0) 



X 


If fault suppression not set, and an illegal memory operand effective address in 
the CS, DS, ES, FS or GS segments. 




X 

If fault suppression not set, and the memory address is in a non-canonical form. 

X 

X 



If fault suppression not set, and any part of the operand lies outside the effective 
address space from 0 to FFFFFI. 

Page Fault #PF(fault- 
code) 


X 

X 

X 

If fault suppression not set, and a page fault. 
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EVEX-encoded vector instructions that do not cause SIMD FP exception nor support memory fault suppression 
follow exception class E4NF. 


Table 2-50. Type E4NF Class Exception Conditions 


Exception 

Real 

Virtual 80x86 

Protected and 

Compatibility 

64-bit 

Cause of Exception 

Invalid Opcode, #UD 

X 

X 



If EVEX prefix present. 



X 

X 

If CR4.0SXSAVE[bit 18]=0. 

If any one of following conditions applies: 

■ State requirement. Table 2-37 not met. 

■ Opcode independent #UD condition in Table 2-38. 

■ Operand encoding #UD conditions in Table 2-39. 

■ Opmask encoding #UD condition of Table 2-40. 

■ If EVEX.b != 0 and in E4NF.nb subclass (see E4NF.nb entries in Table 2-43). 

■ If EVEX.L'L!= 10b(VL=512). 

X 

X 

X 

X 

If preceded by a LOCK prefix (FOH). 



X 

X 

If any REX, F2, F3, or 66 prefixes precede a EVEX prefix. 

X 

X 

X 

X 

If any corresponding CPUID feature flag is 'O'. 

Device Not Available, 
#NM 

X 

X 

X 

X 

If CR0.TS[bit 3]=1. 

Stack, SS(0) 



X 


For an illegal address in the SS segment. 




X 

If a memory address referencing the SS segment is in a non-canonical form. 

General Protection, 
#GP(0) 



X 


For an illegal memory operand effective address in the CS, DS, ES, FS or GS seg¬ 
ments. 




X 

If the memory address is in a non-canonical form. 

X 

X 



If any part of the operand lies outside the effective address space from 0 to 
FFFFH. 

Page Fault #PF(fault- 
code) 


X 

X 

X 

For a page fault. 
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2.7.5 Exceptions Type E5 and E5NF 

EVEX-encoded scalar/partial-vector instructions that cause no SIMD FP exception and support memory fault 
suppression follow exception class E5. 


Table 2-51. Type E5 Class Exception Conditions 


Exception 

(0 

o 

cc 

Virtual 80x86 

Protected and 

Compatibility 

64-bit 

Cause of Exception 

Invalid Opcode, #UD 

X 

X 



If EVEX prefix present. 



X 

X 

If CR4.0SXSAVE[blt 18]=0. 

If any one of following conditions applies: 

■ State reguirement. Table 2-37 not met. 

■ Opcode Independent #UD condition In Table 2-38. 

■ Operand encoding #UD conditions In Table 2-39. 

■ Opmask encoding #UD condition of Table 2-40. 

■ lfEVEX.b!=0. 

■ If EVEX.L'LI= 10b(VL=512). 

X 

X 

X 

X 

If preceded by a LOCK prefix (FOH). 



X 

X 

If any REX, F2, F3, or 66 prefixes precede a EVEX prefix. 

X 

X 

X 

X 

If any corresponding CPUID feature flag is 'O'. 

Device Not Available, 
#NM 

X 

X 

X 

X 

If CR0.TS[bit 3]=1. 

Stack, SS(0) 



X 


If fault suppression not set, and an illegal address in the SS segment. 




X 

If fault suppression not set, and a memory address referencing the SS segment is 
in a non-canonical form. 

General Protection, 
#CP(0) 



X 


If fault suppression not set, and an illegal memory operand effective address in the 
CS, DS, ES, FS or GS segments. 




X 

If fault suppression not set, and the memory address is in a non-canonical form. 

X 

X 



If fault suppression not set, and any part of the operand lies outside the effective 
address space from 0 to FFFFH. 

Page Fault #PF(fault- 
code) 


X 

X 

X 

If fault suppression not set, and a page fault. 

Alignment Check 
#AC(0) 


X 

X 

X 

If alignment checking is enabled and an unaligned memory reference of 8 bytes or 
less is made while the current privilege level is 3. 


EVEX-encoded scalar/partial vector instructions that do not cause SIMD FP exception nor support memory fault 
suppression follow exception class E5NF. 


Vol. 2A 2-55 






















INSTRUCTION FORMAT 


Table 2-52. Type E5NF Class Exception Conditions 


Exception 

fD 

Q) 

Virtual 80x86 

Protected and 

Compatibility 

64-bit 

Cause of Exception 

Invalid Opcode, #UD 

X 

X 



If EVEX prefix present. 



X 

X 

If CR4.0SXSAVE[blt 18]=0. 

If any one of following conditions applies: 

■ State requirement. Table 2-37 not met. 

■ Opcode independent #UD condition in Table 2-38. 

■ Operand encoding #UD conditions in Table 2-39. 

■ Opmask encoding #UD condition of Table 2-40. 

■ lfEVEX.b!=0. 

■ If EVEX.L'LI= 10b(VL=512). 

X 

X 

X 

X 

If preceded by a LOCK prefix (FOH). 



X 

X 

If any REX, F2, F3, or 66 prefixes precede a EVEX prefix. 

X 

X 

X 

X 

If any corresponding CPUID feature flag is 'O'. 

Device Not Available, 
#NM 

X 

X 

X 

X 

If CR0.TS[bit 3]=1. 

Stack, SS(0) 



X 


If an illegal address in the SS segment. 




X 

If a memory address referencing the SS segment is in a non-canonical form. 

General Protection, 
#CP(0) 



X 


If an illegal memory operand effective address in the CS, DS, ES, FS or GS segments. 




X 

If the memory address is in a non-canonical form. 

X 

X 



If any part of the operand lies outside the effective address space from 0 to 

FFFFH. 

Page Fault #PF(fault- 
code) 


X 

X 

X 

For a page fault. 

Alignment Check 
#AC(0) 


X 

X 

X 

If alignment checking is enabled and an unaligned memory reference of 8 bytes or 
less is made while the current privilege level is 3. 
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2.7.6 Exceptions Type E6 and E6NF 


Table 2-53. Type E6 Class Exception Conditions 


Exception 

Real 

Virtual 80x86 

Protected and 

Compatibility 

64-bit 

Cause of Exception 

Invalid Opcode, #UD 

X 

X 



If EVEX prefix present. 



X 

X 

If CR4.0SXSAVE[bit 18]=0. 

If any one of following conditions applies: 

■ State reguirement. Table 2-37 not met. 

■ Opcode independent #UD condition in Table 2-38. 

■ Operand encoding #UD conditions in Table 2-39. 

■ Opmask encoding #UD condition of Table 2-40. 

■ lfEVEX.b!=0. 

■ If EVEX.L'L!= 10b(VL=512). 



X 

X 

If preceded by a LOCK prefix (FOFI). 



X 

X 

If any REX, F2, F3, or 66 prefixes precede a EVEX prefix. 



X 

X 

If any corresponding CPUID feature flag is 'O'. 

Device Not Available, 
#NM 



X 

X 

If CR0.TS[bit 3]=1. 

Stack, SS(0) 



X 


If fault suppression not set, and an illegal address in the SS segment. 




X 

If fault suppression not set, and a memory address referencing the SS segment is 
in a non-canonical form. 

General Protection, 
#GP(0) 



X 


If fault suppression not set, and an illegal memory operand effective address in the 
CS, DS, ES, FS or GS segments. 




X 

If fault suppression not set, and the memory address is in a non-canonical form. 

Page Fault #PF(fault- 
code) 



X 

X 

If fault suppression not set, and a page fault. 

Alignment Check 
#AC(0) 



X 

X 

For 4 or 8 byte memory references if alignment checking is enabled and an 
unaligned memory reference of 8 bytes or less is made while the current privilege 
level is 3. 
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EVEX-encoded instructions that do not cause SIMD FP exception nor support memory fault suppression follow 
exception class E6NF. 


Table 2-54. Type E6NF Class Exception Conditions 


Exception 

Real 

Virtual 80x86 

Protected and 

Compatibility 

64-bit 

Cause of Exception 

Invalid Opcode, #UD 

X 

X 



If EVEX prefix present. 




X 

X 

If CR4.0SXSAVE[bit 18]=0. 

If any one of following conditions applies: 

■ State reguirement. Table 2-37 not met. 

■ Opcode independent #UD condition in Table 2-38. 

■ Operand encoding #UD conditions in Table 2-39. 

■ Opmask encoding #UD condition of Table 2-40. 

■ lfEVEX.bl=0. 

■ If EVEX.L'L!= 10b(VL=512). 




X 

X 

If preceded by a LOCK prefix (FOH). 




X 

X 

If any REX, F2, F3, or 66 prefixes precede a EVEX prefix. 




X 

X 

If any corresponding CPUID feature flag is 'O'. 

Device Not Available, 
#NM 



X 

X 

If CR0.TS[bit 3]=1. 

Stack, SS(0) 



X 


For an illegal address in the SS segment. 




X 

If a memory address referencing the SS segment is in a non-canonical form. 

General Protection, 
#GP(0) 



X 


For an illegal memory operand effective address in the CS, DS, ES, FS or GS seg¬ 
ments. 




X 

If the memory address is in a non-canonical form. 

Page Fault #PF(fault- 
code) 



X 

X 

For a page fault. 

Alignment Check 
#AC(0) 



X 

X 

For 4 or 8 byte memory references if alignment checking is enabled and an 
unaligned memory reference of 8 bytes or less is made while the current privilege 
level is 3. 
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2.7.7 Exceptions Type E7NM 

EVEX-encoded instructions that cause no SIMD FP exception and do not reference memory follow exception class 
E7NM. 


Table 2-55. Type E7NM Class Exception Conditions 


Exception 

Real 

Virtual 80x86 

Protected and 

Compatibility 

64-bit 

Cause of Exception 

Invalid Opcode, #UD 

X 

X 



If EVEX prefix present. 



X 

X 

If CR4.0SXSAVE[bit 18]=0. 

If any one of following conditions applies: 

■ State requirement. Table 2-37 not met. 

■ Opcode Independent #UD condition In Table 2-38. 

■ Operand encoding #UD conditions in Table 2-39. 

■ Opmask encoding #UD condition of Table 2-40. 

■ lfEVEX.b!=0. 

■ Instruction specific EVEX.L'L restriction not met. 

X 

X 

X 

X 

If preceded by a LOCK prefix (FOFI). 



X 

X 

If any REX, F2, F3, or 66 prefixes precede a EVEX prefix. 

X 

X 

X 

X 

If any corresponding CPUID feature flag is 'O'. 

Device Not Available, 
#NM 



X 

X 

If CR0.TS[bit 3]=1. 
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2.7.8 Exceptions Type E9 and E9NF 

EVEX-encoded vector or partial-vector instructions that do not cause no SIMD FP exception and support memory 
fault suppression follow exception class E9. 


Table 2-56. Type E9 Class Exception Conditions 


Exception 

TO 

Q) 

Virtual 80x86 

Protected and 

Compatibility 

64-bit 

Cause of Exception 

Invalid Opcode, #UD 

X 

X 



If EVEX prefix present. 



X 

X 

If CR4.0SXSAVE[bit 18]=0. 

If any one of following conditions applies: 

■ State requirement. Table 2-37 not met. 

■ Opcode independent #UD condition in Table 2-38. 

■ Operand encoding #UD conditions in Table 2-39. 

■ Opmask encoding #UD condition of Table 2-40. 

■ lfEVEX.b!=0. 

■ If EVEX.L'LI=00b(VL=128). 

X 

X 

X 

X 

If preceded by a LOCK prefix (FOH). 



X 

X 

If any REX, F2, F3, or 66 prefixes precede a EVEX prefix. 

X 

X 

X 

X 

If any corresponding CPUID feature flag is 'O'. 

Device Not Available, 
#NM 

X 

X 

X 

X 

If CR0.TS[bit 3]=1. 

Stack, SS(0) 



X 


If fault suppression not set, and an illegal address in the SS segment. 




X 

If fault suppression not set, and a memory address referencing the SS segment is 
in a non-canonical form. 

General Protection, 
#CP(0) 



X 


If fault suppression not set, and an illegal memory operand effective address in the 
CS, DS, ES, FS or GS segments. 




X 

If fault suppression not set, and the memory address is in a non-canonical form. 

X 

X 



If fault suppression not set, and any part of the operand lies outside the effective 
address space from 0 to FFFFH. 

Page Fault #PF(fault- 
code) 


X 

X 

X 

If fault suppression not set, and a page fault. 

Alignment Check 
#AC(0) 


X 

X 

X 

If alignment checking is enabled and an unaligned memory reference of 8 bytes or 
less is made while the current privilege level is 3. 
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EVEX-encoded vector or partial-vector instructions that must be encoded with VEX.L'L = 0, do not cause SIMD FP 
exception nor support memory fault suppression follow exception class E9NF. 


Table 2-57. Type E9NF Class Exception Conditions 


Exception 

(0 

Q) 

OC 

Virtual 80x86 

Protected and 

Compatibility 

64-bit 

Cause of Exception 

Invalid Opcode, #UD 

X 

X 



If EVEX prefix present. 



X 

X 

If CR4.0SXSAVE[blt 18]=0. 

If any one of following conditions applies: 

■ State requirement. Table 2-37 not met. 

■ Opcode independent #UD condition in Table 2-38. 

■ Operand encoding #UD conditions in Table 2-39. 

■ Opmask encoding #UD condition of Table 2-40. 

■ lfEVEX.b!=0. 

■ lfEVEX.L'LI=00b(VL=128). 

X 

X 

X 

X 

If preceded by a LOCK prefix (FOH). 



X 

X 

If any REX, F2, F3, or 66 prefixes precede a EVEX prefix. 

X 

X 

X 

X 

If any corresponding CPUID feature flag is 'O'. 

Device Not Available, 
#NM 

X 

X 

X 

X 

If CR0.TS[bit 3]=1. 

Stack, SS(0) 



X 


If an illegal address in the SS segment. 




X 

If a memory address referencing the SS segment is in a non-canonical form. 

General Protection, 
#CP(0) 



X 


If an illegal memory operand effective address in the CS, DS, ES, FS or GS segments. 




X 

If the memory address is in a non-canonical form. 

X 

X 



If any part of the operand lies outside the effective address space from 0 to 

FFFFH. 

Page Fault #PF(fault- 
code) 


X 

X 

X 

For a page fault. 

Alignment Check 
#AC(0) 


X 

X 

X 

If alignment checking is enabled and an unaligned memory reference is made while 
the current privilege level is 3. 
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2.7.9 Exceptions Type E10 

EVEX-encoded scalar instructions that ignore EVEX.L'L vector length encoding and do not cause no SIMD FP excep¬ 
tion, support memory fault suppression follow exception class ElO. 


Table 2-58. Type E10 Class Exception Conditions 


Exception 

TO 

Q) 

Virtual 80x86 

Protected and 

Compatibility 

64-bit 

Cause of Exception 

Invalid Opcode, #UD 

X 

X 



If EVEX prefix present. 



X 

X 

If CR4.0SXSAVE[blt 18]=0. 

If any one of following conditions applies: 

■ State requirement. Table 2-37 not met. 

■ Opcode independent #UD condition in Table 2-38. 

■ Operand encoding #UD conditions in Table 2-39. 

■ Opmask encoding #UD condition of Table 2-40. 

■ lfEVEX.b!=0. 

X 

X 

X 

X 

If preceded by a LOCK prefix (FOH). 



X 

X 

If any REX, F2, F3, or 66 prefixes precede a EVEX prefix. 

X 

X 

X 

X 

If any corresponding CPUID feature flag is 'O'. 

Device Not Available, 
#NM 

X 

X 

X 

X 

If CR0.TS[bit 3]=1. 

Stack, SS(0) 



X 


If fault suppression not set, and an illegal address in the SS segment. 




X 

If fault suppression not set, and a memory address referencing the SS segment is 
in a non-canonical form. 

General Protection, 
#CP(0) 



X 


If fault suppression not set, and an illegal memory operand effective address in the 
CS, DS, ES, FS or GS segments. 




X 

If fault suppression not set, and the memory address is in a non-canonical form. 

X 

X 



If fault suppression not set, and any part of the operand lies outside the effective 
address space from 0 to FFFFH. 

Page Fault #PF(fault- 
code) 


X 

X 

X 

If fault suppression not set, and a page fault. 

Alignment Check 
#AC(0) 


X 

X 

X 

If alignment checking is enabled and an unaligned memory reference of 8 bytes or 
less is made while the current privilege level is 3. 
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EVEX-encoded scalar instructions that must be encoded with VEX.L'L = 0, do not cause SIMD FP exception nor 
support memory fault suppression follow exception class ElONF. 


Table 2-59. Type E1ONF Class Exception Conditions 


Exception 

(0 

Q) 

OC 

Virtual 80x86 

Protected and 

Compatibility 

64-bit 

Cause of Exception 

Invalid Opcode, #UD 

X 

X 



If EVEX prefix present. 



X 

X 

If CR4.0SXSAVE[bit 18]=0. 

If any one of following conditions applies: 

■ State requirement. Table 2-37 not met. 

■ Opcode independent #UD condition in Table 2-38. 

■ Operand encoding #UD conditions in Table 2-39. 

■ Opmask encoding #UD condition of Table 2-40. 

■ lfEVEX.b!=0. 

X 

X 

X 

X 

If preceded by a LOCK prefix (FOH). 



X 

X 

If any REX, F2, F3, or 66 prefixes precede a EVEX prefix. 

X 

X 

X 

X 

If any corresponding CPUID feature flag is 'O'. 

Device Not Available, 
#NM 

X 

X 

X 

X 

If CR0.TS[bit 3]=1. 

Stack, SS(0) 



X 


If fault suppression not set, and an illegal address in the SS segment. 




X 

If fault suppression not set, and a memory address referencing the SS segment is 
in a non-canonical form. 

General Protection, 
#CP(0) 



X 


If fault suppression not set, and an illegal memory operand effective address in the 
CS, DS, ES, FS or GS segments. 




X 

If fault suppression not set, and the memory address is in a non-canonical form. 

X 

X 



If fault suppression not set, and any part of the operand lies outside the effective 
address space from 0 to FFFFH. 

Page Fault #PF(fault- 
code) 


X 

X 

X 

If fault suppression not set, and a page fault. 

Alignment Check 
#AC(0) 


X 

X 

X 

If alignment checking is enabled and an unaligned memory reference of 8 bytes or 
less is made while the current privilege level is 3. 
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2.7.10 Exception Type El 1 (EVEX-only, mem arg no AC, floating-point exceptions) 

EVEX-encoded instructions that can cause SIMD FP exception, memory operand support fault suppression but do 
not cause #AC follow exception class Ell. 


Table 2-60. TypeEH Class Exception Conditions 


Exception 

TO 

Q) 

Virtual 80x86 

Protected and 

Compatibility 

64-bit 

Cause of Exception 

Invalid Opcode, #UD 

X 

X 



If EVEX prefix present. 



X 

X 

If CR4.0SXSAVE[bit 18]=0. 

If any one of following conditions applies: 

■ State reguirement. Table 2-37 not met. 

■ Opcode independent #UD condition in Table 2-38. 

■ Operand encoding #UD conditions in Table 2-39. 

■ Opmask encoding #UD condition of Table 2-40. 

■ lfEVEX.b!=0. 

■ If EVEX.L'LI= 10b(VL=512). 

X 

X 

X 

X 

If preceded by a LOCK prefix (FOH). 



X 

X 

If any REX, F2, F3, or 66 prefixes precede a EVEX prefix. 

X 

X 

X 

X 

If any corresponding CPUID feature flag is 'O'. 

Device Not Available, 
#NM 

X 

X 

X 

X 

If CR0.TS[bit 3]=1. 

Stack, SS(0) 



X 


If fault suppression not set, and an illegal address in the SS segment. 




X 

If fault suppression not set, and a memory address referencing the SS segment is 
in a non-canonical form. 

General Protection, 
#CP(0) 



X 


If fault suppression not set, and an illegal memory operand effective address in the 
CS, DS, ES, FS or GS segments. 




X 

If fault suppression not set, and the memory address is in a non-canonical form. 

X 

X 



If fault suppression not set, and any part of the operand lies outside the effective 
address space from 0 to FFFFH. 

Page Fault #PF (fault- 
code) 


X 

X 

X 

If fault suppression not set, and a page fault. 

SIMD Floating-Point 
Exception, #XM 

X 

X 

X 

X 

If an unmasked SIMD floating-point exception, [sae] not set, and CR4.0SXMMEX- 
CPT[bit 10] = 1. 
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2.7.11 Exception Type El 2 and El 2NP (VSIB mem arg, no AC, no floating-point exceptions) 


Table 2-61. Type El 2 Class Exception Conditions 


Exception 

Real 

Virtual 80x86 

Protected and 

Compatibility 

64-bit 

Cause of Exception 

Invalid Opcode, #UD 

X 

X 



If EVEX prefix present. 



X 

X 

If CR4.0SXSAVE[bit 18]=0. 

If any one of following conditions applies: 

■ State reguirement. Table 2-37 not met. 

■ Opcode independent #UD condition in Table 2-38. 

■ Operand encoding #UD conditions in Table 2-39. 

■ Opmask encoding #UD condition of Table 2-40. 

■ lfEVEX.b!=0. 

■ If EVEX.L1I= 10b(VL=512). 

■ If vvvv 1= 1111 b. 

X 

X 

X 

X 

If preceded by a LOCK prefix (FOH). 



X 

X 

If any REX, F2, F3, or 66 prefixes precede a VEX prefix. 

X 

X 

X 

NA 

If address size attribute is 16 bit. 

X 

X 

X 

X 

If ModR/M.mod = '11 b'. 

X 

X 

X 

X 

If ModR/M.rm 1= '100b'. 

X 

X 

X 

X 

If any corresponding CPUID feature flag is 'O'. 

X 

X 

X 

X 

If kO is used (gather or scatter operation). 

X 

X 

X 

X 

If index = destination register (gather operation). 

Device Not Available, 
#NM 

X 

X 

X 

X 

If CR0.TS[bit 3]=1. 

Stack, SS(0) 



X 


For an illegal address in the SS segment. 




X 

If a memory address referencing the SS segment is in a non-canonical form. 

General Protection, 
#GP(0) 



X 


For an illegal memory operand effective address in the CS, DS, ES, FS or GS seg¬ 
ments. 




X 

If the memory address is in a non-canonical form. 

X 

X 



If any part of the operand lies outside the effective address space from 0 to 

FFFFH. 

Page Fault #PF (fault- 
code) 


X 

X 

X 

For a page fault. 
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EVEX-encoded prefetch instructions that do not cause #PF follow exception class E12NP. 


Table 2-62. Type E12NP Class Exception Conditions 


Exception 

Real 

Virtual 80x86 

Protected and 

Compatibility 

64-bit 

Cause of Exception 

Invalid Opcode, #UD 

X 

X 



If EVEX prefix present. 



X 

X 

If CR4.0SXSAVE[bit 18]=0. 

If any one of following conditions applies: 

■ State requirement. Table 2-37 not met. 

■ Opcode independent #UD condition in Table 2-38. 

■ Operand encoding #UD conditions in Table 2-39. 

■ Opmask encoding #UD condition of Table 2-40. 

■ lfEVEX.b!=0. 

■ If EVEX.L'LI= 10b(VL=512). 

X 

X 

X 

X 

If preceded by a LOCK prefix (FOFI). 



X 

X 

If any REX, F2, F3, or 66 prefixes precede a VEX prefix. 

X 

X 

X 

NA 

If address size attribute is 16 bit. 

X 

X 

X 

X 

If ModR/M.mod = '11 b'. 

X 

X 

X 

X 

If ModR/M.rm != '100b'. 

X 

X 

X 

X 

If any corresponding CPUID feature flag is 'O'. 

X 

X 

X 

X 

If kO is used (gather or scatter operation). 

Device Not Available, 
#NM 

X 

X 

X 

X 

If CR0.TS[bit 3]=1. 

Stack, SS(0) 



X 


For an illegal address in the SS segment. 




X 

If a memory address referencing the SS segment is in a non-canonical form. 

General Protection, 
#GP(0) 



X 


For an illegal memory operand effective address in the CS, DS, ES, FS or GS seg¬ 
ments. 




X 

If the memory address is in a non-canonical form. 

X 

X 



If any part of the operand lies outside the effective address space from 0 to 

FFFFH. 
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2.8 EXCEPTION CLASSIFICATIONS OF OPMASK INSTRUCTIONS 

The exception behavior of VEX-encoded opmask instructions are listed below. 

Exception conditions of Opmask instructions that do not address memory are listed as Type K20. 


Table 2-63. TYPE K20 Exception Definition (VEX-Encoded OpMask Instructions w/o Memory Arg) 


Exception 

m 

Q) 

Of 

Virtual 80x86 

Protected and 

Compatibility 

64-bit 

Cause of Exception 

Invalid Opcode, #UD 

X 

X 

X 

X 

If relevant CPUID feature flag is 'O'. 


X 

X 



If a VEX prefix is present. 




X 

X 

If CR4.0SXSAVE[bit 18]=0. 

If any one of following conditions applies: 

■ State requirement. Table 2-37 not met. 

■ Opcode independent #UD condition in Table 2-38. 

■ Operand encoding #UD conditions in Table 2-39. 




X 

X 

If any REX, F2, F3, or 66 prefixes precede a VEX prefix. 




X 

X 

If ModRM:[7:6]l= 11b. 

Device Not Available, 
#NM 

X 

X 

X 

X 

If CR0.TS[bit 3]=1. 
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Exception conditions of Opmask instructions that address memory are listed as Type K21. 


Table 2-64. TYPE K21 Exception Definition (VEX-Encoded OpMask Instructions Addressing Memory) 


Exception 

(D 

0) 

OC 

Virtual 80x86 

Protected and 

Compatibility 

64-bit 

Cause of Exception 

Invalid Opcode, #UD 

X 

X 

X 

X 

If relevant CPUID feature flag is 'O'. 


X 

X 



If a VEX prefix is present. 




X 

X 

If CR4.0SXSAVE[bit 18]=0. 

If any one of following conditions applies: 

■ State requirement. Table 2-37 not met. 

■ Opcode independent #UD condition in Table 2-38. 

■ Operand encoding #UD conditions in Table 2-39. 

Device Not Available, 
#NM 

X 

X 

X 

X 

If CRO.TS[bit 3]=1. 




X 

X 

If any REX, F2, F3, or 66 prefixes precede a VEX prefix. 

Stack, SS(0) 

X 

X 

X 


For an illegal address in the SS segment. 




X 

If a memory address referencing the SS segment is in a non-canonical form. 

General Protection, 
#GP(0) 



X 


For an illegal memory operand effective address in the CS, DS, ES, FS or GS seg¬ 
ments. 

If the DS, ES, FS, or GS register is used to access memory and it contains a null 
segment selector. 




X 

If the memory address is in a non-canonical form. 

X 

X 



If any part of the operand lies outside the effective address space from 0 to 
FFFFH. 

Page Fault #PF(fault- 
code) 


X 

X 

X 

For a page fault. 

Alignment Check 
#AC(0) 


X 

X 

X 

If alignment checking is enabled and an unaligned memory reference of 8 bytes or 
less is made while the current privilege level is 3. 
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CHAPTER 3 

INSTRUCTION SET REFERENCE, A-L 


This chapter describes the instruction set for the Intei 64 and IA-32 architectures (A-L) in IA-32e, protected, 
virtuai-8086, and reai-address modes of operation. The set inciudes generai-purpose, x87 FPU, MMX, 
SSE/SSE2/SSE3/SSSE3/SSE4, AESNI/PCLMULQDQ, AVX and system instructions. See aiso Chapter 4, "Instruction 
Set Reference, M-U," in the Intei® 64 and IA-32 Architectures Software Developer's Manual, Volume 2B, and 
Chapter 5, "Instruction Set Reference, V-Z," in the Intel® 64 and IA-32 Architectures Software Developer's 
Manual, Volume 2C. 

For each instruction, each operand combination is described. A description of the instruction and its operand, an 
operational description, a description of the effect of the instructions on flags in the EFLAGS register, and a 
summary of exceptions that can be generated are also provided. 


3.1 INTERPRETING THE INSTRUCTION REFERENCE PAGES 

This section describes the format of information contained in the instruction reference pages in this chapter. It 
explains notational conventions and abbreviations used in these sections. 


3.1.1 Instruction Format 

The following is an example of the format used for each instruction description in this chapter. The heading below 
introduces the example. The table below provides an example summary table. 

CMC—Complement Carry Flag [this is an example] 


Opcode 

Instruction 

Op/En 

64/32-bit 

Mode 

CPUID 

Feature Fiag 

Description 

F5 

CMC 

A 

V/V 

NP 

Complement carry flag. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

NP 

NA 

NA 

NA 

NA 
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3.1.1.1 Opcode Column in the Instruction Summary Table (Instructions without VEX Prefix) 

The "Opcode" column in the table above shows the object code produced for each form of the instruction. When 

possible, codes are given as hexadecimal bytes in the same order in which they appear in memory. Definitions of 

entries other than hexadecimal bytes are as follows: 

• REX.W — Indicates the use of a REX prefix that affects operand size or instruction semantics. The ordering of 
the REX prefix and other optional/mandatory instruction prefixes are discussed Chapter 2. Note that REX 
prefixes that promote legacy instructions to 64-bit behavior are not listed explicitly in the opcode column. 

• / digit — A digit between 0 and 7 indicates that the ModR/M byte of the instruction uses only the r/m (register 
or memory) operand. The reg field contains the digit that provides an extension to the instruction's opcode. 

• / r — Indicates that the ModR/M byte of the instruction contains a register operand and an r/m operand. 

• cb, cw, cd, cp, CO, ct — A 1-byte (cb), 2-byte (cw), 4-byte (cd), 6-byte (cp), 8-byte (co) or 10-byte (ct) value 
following the opcode. This value is used to specify a code offset and possibly a new value for the code segment 
register. 

• ib, iw, id, io — A 1-byte (ib), 2-byte (iw), 4-byte (id) or 8-byte (io) immediate operand to the instruction that 
follows the opcode, ModR/M bytes or scale-indexing bytes. The opcode determines if the operand is a signed 
value. All words, doublewords and quadwords are given with the low-order byte first. 

• -Frb, -l-rw, +rd, -Fro — Indicated the lower 3 bits of the opcode byte is used to encode the register operand 
without a modR/M byte. The instruction lists the corresponding hexadecimal value of the opcode byte with low 
3 bits as 000b. In non-64-bit mode, a register code, from 0 through 7, is added to the hexadecimal value of the 
opcode byte. In 64-bit mode, indicates the four bit field of REX.b and opcode[2:0] field encodes the register 
operand of the instruction, "-i-ro" is applicable only in 64-bit mode. See Table 3-1 for the codes. 

• +i — A number used in floating-point instructions when one of the operands is ST(i) from the FPU register stack. 
The number i (which can range from 0 to 7) is added to the hexadecimal byte given at the left of the plus sign 
to form a single opcode byte. 


Table 3-1. Register Codes Associated With +rb, +rw, +rd, +ro 


byte register 

word register 

dword register 

quadword register 
(64-Bit Mode only) 

Register 

REX.B 

Reg Field 

Register 

REX.B 

Reg Field 

Register 

CO 

>< 

UJ 

Reg Field 

Register 

REX.B 

Reg Field 

AL 

None 

0 

AX 

None 

0 

EAX 

None 

0 

RAX 

None 

0 

CL 

None 

1 

CX 

None 

1 

ECX 

None 

1 

RCX 

None 

1 

DL 

None 

2 

DX 

None 

2 

EDX 

None 

2 

RDX 

None 

2 

BL 

None 

3 

BX 

None 

3 

EBX 

None 

3 

RBX 

None 

3 

AH 

Not 

encodab 
le (N.E.) 

4 

SP 

None 

4 

ESP 

None 

4 

N/A 

N/A 

N/A 

CH 

N.E. 

5 

BP 

None 

5 

EBP 

None 

5 

N/A 

N/A 

N/A 

DH 

N.E. 

6 

SI 

None 

6 

ESI 

None 

6 

N/A 

N/A 

N/A 

BH 

N.E. 

7 

Dl 

None 

7 

EDI 

None 

7 

N/A 

N/A 

N/A 

SPL 

Yes 

4 

SP 

None 

4 

ESP 

None 

4 

RSP 

None 

4 

BPL 

Yes 

5 

BP 

None 

5 

EBP 

None 

5 

RBP 

None 

5 

SIL 

Yes 

6 

SI 

None 

6 

ESI 

None 

6 

RSI 

None 

6 

OIL 

Yes 

7 

Dl 

None 

7 

EDI 

None 

7 

RDI 

None 

7 

Registers RB - R15 (see below): Available in 64-Bit Mode Only 

R8L 

Yes 

0 

R8W 

Yes 

0 

R8D 

Yes 

0 

RB 

Yes 

0 

R9L 

Yes 

1 

R9W 

Yes 

1 

R9D 

Yes 

1 

R9 

Yes 

1 

R10L 

Yes 

2 

R10W 

Yes 

2 

R10D 

Yes 

2 

RIO 

Yes 

2 
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Table 3-1. Register Codes Associated With +rb, +rw, +rd, +ro (Contd.) 


byte register 

word register 

dword register 

quadword register 
(64-Bit Mode only) 

Register 

REX.B 

Reg Field 

Register 

REX.B 

Reg Field 

Register 

REX.B 

Reg Field 

Register 

REX.B 

Reg Field 

R11L 

Yes 

3 

R11W 

Yes 

3 

R11D 

Yes 

3 

R11 

Yes 

3 

R12L 

Yes 

4 

R12W 

Yes 

4 

R12D 

Yes 

4 

R12 

Yes 

4 

R13L 

Yes 

5 

R13W 

Yes 

5 

R13D 

Yes 

5 

R13 

Yes 

5 

R14L 

Yes 

6 

R14W 

Yes 

6 

R14D 

Yes 

6 

R14 

Yes 

6 

R15L 

Yes 

7 

R15W 

Yes 

7 

R15D 

Yes 

7 

R15 

Yes 

7 


3.1.1 .2 Opcode Column in the Instruction Summary Table (Instructions with VEX prefix) 

In the Instruction Summary Table, the Opcode column presents each instruction encoded using the VEX prefix in 

following form (including the modR/M byte if applicable, the immediate byte if applicable): 

VEX.[NDS].[128,256].[66,F2,F3].0F/ 0F3A/ 0F38.[W0,W1] opcode [/ r] [/ ib,/ is4] 

• VEX — Indicates the presence of the VEX prefix is required. The VEX prefix can be encoded using the three- 
byte form (the first byte is C4H), or using the two-byte form (the first byte is C5H). The two-byte form of VEX 
only applies to those instructions that do not require the following fields to be encoded: VEX.mmmmm, VEX.W, 
VEX.X, VEX.B. Refer to Section 2.3 for more detail on the VEX prefix. 

The encoding of various sub-fields of the VEX prefix is described using the following notations: 

— NDS, NDD, DDS: Specifies that VEX.vvvv field is valid for the encoding of a register operand: 

• VEX.NDS: VEX.vvvv encodes the first source register in an instruction syntax where the content of 
source registers will be preserved. 

• VEX.NDD: VEX.vvvv encodes the destination register that cannot be encoded by ModR/M:reg field. 

• VEX.DDS: VEX.vvvv encodes the second source register in a three-operand instruction syntax where 
the content of first source register will be overwritten by the result. 

• If none of NDS, NDD, and DDS is present, VEX.vvvv must be 1111b (i.e. VEX.vvvv does not encode an 
operand). The VEX.vvvv field can be encoded using either the 2-byte or 3-byte form of the VEX prefix. 

- 128,256: VEX.L field can be 0 (denoted by VEX. 128 orVEX.LZ) or 1 (denoted by VEX.256). The VEX.L field 

can be encoded using either the 2-byte or 3-byte form of the VEX prefix. The presence of the notation 

VEX.256 or VEX.128 in the opcode column should be interpreted as follows: 

• If VEX.256 is present in the opcode column: The semantics of the instruction must be encoded with 
VEX.L = 1. An attempt to encode this instruction with VEX.L= 0 can result in one of two situations: (a) 
if VEX. 128 version is defined, the processor will behave according to the defined VEX. 128 behavior; (b) 
an #UD occurs if there is no VEX. 128 version defined. 

• If VEX. 128 is present in the opcode column but there is no VEX.256 version defined for the same 
opcode byte: Two situations apply: (a) For VEX-encoded, 128-bit SIMD integer instructions, software 
must encode the instruction with VEX.L = 0. The processor will treat the opcode byte encoded with 
VEX.L= 1 by causing an #UD exception; (b) For VEX-encoded, 128-bit packed floating-point instruc¬ 
tions, software must encode the instruction with VEX.L = 0. The processor will treat the opcode byte 
encoded with VEX.L= 1 by causing an #UD exception (e.g. VMOVLPS). 

• If VEX.LIG is present in the opcode column: The VEX.L value is ignored. This generally applies to VEX- 
encoded scalar SIMD floating-point instructions. Scalar SIMD floating-point instruction can be distin¬ 
guished from the mnemonic of the instruction. Generally, the last two letters of the instruction 
mnemonic would be either "SS", "SD", or "SI" for SIMD floating-point conversion instructions. 

• If VEX.LZ is present in the opcode column: The VEX.L must be encoded to be OB, an #UD occurs if 
VEX.L is not zero. 
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— 66,F2,F3: The presence or absence of these values map to the VEX.pp field encodings. If absent, this 
corresponds to VEX.pp=00B. If present, the corresponding VEX.pp value affects the "opcode" byte in the 
same way as if a SIMD prefix (66H, F2H or F3FI) does to the ensuing opcode byte. Thus a non-zero encoding 
of VEX.pp may be considered as an implied 66FI/F2FI/F3FI prefix. The VEX.pp field may be encoded using 
either the 2-byte or 3-byte form of the VEX prefix. 

— 0F,0F3A,0F38: The presence maps to a valid encoding of the VEX.mmmmm field. Only three encoded 
values of VEX.mmmmm are defined as valid, corresponding to the escape byte sequence of OFFI, 0F3AH 
and 0F38FI. The effect of a valid VEX.mmmmm encoding on the ensuing opcode byte is same as if the corre¬ 
sponding escape byte sequence on the ensuing opcode byte for non-VEX encoded instructions. Thus a valid 
encoding of VEX.mmmmm may be consider as an implies escape byte sequence of either OFH, 0F3AH or 
0F38FI. The VEX.mmmmm field must be encoded using the 3-byte form of VEX prefix. 

— 0F,0F3A,0F38 and 2-byte/ 3-byte VEX: The presence of 0F3A and 0F38 in the opcode column implies 
that opcode can only be encoded by the three-byte form of VEX. The presence of OF in the opcode column 
does not preclude the opcode to be encoded by the two-byte of VEX if the semantics of the opcode does not 
require any subfield of VEX not present in the two-byte form of the VEX prefix. 

— WO:VEX.W=0. 

— W1:VEX.W=1. 

— The presence of WO/Wl in the opcode column applies to two situations: (a) it is treated as an extended 
opcode bit, (b) the instruction semantics support an operand size promotion to 64-bit of a general-purpose 
register operand or a 32-bit memory operand. The presence of W1 in the opcode column implies the opcode 
must be encoded using the 3-byte form of the VEX prefix. The presence of WO in the opcode column does 
not preclude the opcode to be encoded using the C5FI form of the VEX prefix, if the semantics of the opcode 
does not require other VEX subfields not present in the two-byte form of the VEX prefix. Please see Section 
2.3 on the subfield definitions within VEX. 

— Wl G: can use C5FI form (if not requiring VEX.mmmmm) or VEX.W value is ignored in the C4H form of VEX 
prefix. 

— If WIG is present, the instruction may be encoded using either the two-byte form or the three-byte form of 
VEX. When encoding the instruction using the three-byte form of VEX, the value of VEX.W is ignored. 

• opcode — Instruction opcode. 

• / is4 — An 8-bit immediate byte is present containing a source register specifier in either imm8[7:4] (for 64- 

bit mode) or imm8[6:4] (for 32-bit mode), and instruction-specific payload in imm8[3:0]. 

• In general, the encoding o f VEX.R, VEX.X, VEX.B field are not shown explicitly in the opcode column. The 

encoding scheme of VEX.R, VEX.X, VEX.B fields must follow the rules defined in Section 2.3. 


EVEX.[NDS/ NDD/ DDS].[128,256,512,LI G].[66,F2,F3].OF/ 0F3A/ 0F38.[W0,W1,WIG] opcode [/ r] [ib] 

• EVEX — The EVEX prefix is encoded using the four-byte form (the first byte is 62H). Refer to Section 2.6.1 for 

more detail on the EVEX prefix. 

The encoding of various sub-fields of the EVEX prefix is described using the following notations: 

— NDS, NDD, DDS: implies that EVEX.vvvv (and EVEX.v') field is valid for the encoding of an operand. It may 
specify either the source register (NDS) or the destination register (NDD). DDS expresses a syntax where 
vvvv encodes the second source register in a three-operand instruction syntax where the content of first 
source register will be overwritten by the result. If both NDS and NDD absent (i.e. EVEX.vvvv does not 
encode an operand), EVEX.vvvv must be 1111b (and EVEX.v' must be lb). 

— 128, 256, 512, LIG: This corresponds to the vector length; three values are allowed by EVEX: 512-bit, 
256-bit and 128-bit. Alternatively, vector length is ignored (LIG) for certain instructions; this typically 
applies to scalar instructions operating on one data element of a vector register. 

— 66,F2,F3: The presence of these value maps to the EVEX.pp field encodings. The corresponding VEX.pp 
value affects the "opcode" byte in the same way as if a SIMD prefix (66FI, F2H or F3FI) does to the ensuing 
opcode byte. Thus a non-zero encoding of VEX.pp may be considered as an implied 66FI/F2H/F3FI prefix. 

— 0F,0F3A,0F38: The presence maps to a valid encoding of the EVEX.mmm field. Only three encoded values 
of EVEX.mmm are defined as valid, corresponding to the escape byte sequence of OFH, 0F3AH and 0F38H. 
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The effect of a valid EVEX.mmm encoding on the ensuing opcode byte is the same as if the corresponding 
escape byte sequence on the ensuing opcode byte for non-EVEX encoded instructions. Thus a valid 
encoding of EVEX.mmm may be considered as an implied escape byte sequence of either OFH, 0F3AFI or 
0F38H. 

- WO: EVEX.W=0. 

- Wl: EVEX.W = 1. 

- WIG: EVEX.W bit ignored 

• opcode — Instruction opcode. 

• In general, the encoding of EVEX.R and R', EVEX.X and X', and EVEX.B and B' fields are not shown explicitly in 
the opcode column. 

3.1.1.3 Instruction Column in the Opcode Summary Table 

The "Instruction" column gives the syntax of the instruction statement as it would appear in an ASM386 program. 
The following is a list of the symbols used to represent operands in the instruction statements: 

• rel8 — A relative address in the range from 128 bytes before the end of the instruction to 127 bytes after the 
end of the instruction. 

• rellS, rel32 — A relative address within the same code segment as the instruction assembled. The rell6 
symbol applies to instructions with an operand-size attribute of 16 bits; the rel32 symbol applies to instructions 
with an operand-size attribute of 32 bits. 

• ptrl6:16, ptrl6:32 — A far pointer, typically to a code segment different from that of the instruction. The 
notation 16:16 indicates that the value of the pointer has two parts. The value to the left of the colon is a 16- 
bit selector or value destined for the code segment register. The value to the right corresponds to the offset 
within the destination segment. The ptrl6:16 symbol is used when the instruction's operand-size attribute is 
16 bits; the ptrl6:32 symbol is used when the operand-size attribute is 32 bits. 

• r8 — One of the byte general-purpose registers: AL, CL, DL, BL, AFI, CFI, DFI, BFI, BPL, SPL, DIL and SIL; or one 
of the byte registers (R8L - R15L) available when using REX.R and 64-bit mode. 

• rl6 — One of the word general-purpose registers: AX, CX, DX, BX, SP, BP, SI, DI; or one of the word registers 
(R8-R15) available when using REX.R and 64-bit mode. 

• r32 — One of the doubleword general-purpose registers: EAX, ECX, EDX, EBX, ESP, EBP, ESI, EDI; or one of 
the doubleword registers (R8D - R15D) available when using REX.R in 64-bit mode. 

• r64 — One of the quadword general-purpose registers: RAX, RBX, RCX, RDX, RDI, RSI, RBP, RSP, R8-R15. 
These are available when using REX.R and 64-bit mode. 

• imm8 — An immediate byte value. The imm8 symbol is a signed number between -128 and -1-127 inclusive. 
For instructions in which imm8 is combined with a word or doubleword operand, the immediate value is sign- 
extended to form a word or doubleword. The upper byte of the word is filled with the topmost bit of the 
immediate value. 

• imml6 — An immediate word value used for instructions whose operand-size attribute is 16 bits. This is a 
number between -32,768 and -1-32,767 inclusive. 

• imm32 — An immediate doubleword value used for instructions whose operand-size attribute is 32 
bits. It allows the use of a number between -1-2,147,483,647 and -2,147,483,648 inclusive. 

• imm64 — An immediate quadword value used for instructions whose operand-size attribute is 64 bits. 
The value allows the use of a number between -1-9,223,372,036,854,775,807 and - 
9,223,372,036,854,775,808 inclusive. 

• r/ m8 — A byte operand that is either the contents of a byte general-purpose register (AL, CL, DL, BL, AFI, CFI, 
DFI, BFI, BPL, SPL, DIL and SIL) or a byte from memory. Byte registers R8L - R15L are available using REX.R in 
64-bit mode. 

• r/ ml6 — A word general-purpose register or memory operand used for instructions whose operand-size 
attribute is 16 bits. The word general-purpose registers are: AX, CX, DX, BX, SP, BP, SI, DI. The contents of 
memory are found at the address provided by the effective address computation. Word registers R8W - R15W 
are available using REX.R in 64-bit mode. 
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• r/ nn32 — A doubleword general-purpose register or memory operand used for instructions whose operand- 
size attribute is 32 bits. The doubleword general-purpose registers are: EAX, ECX, EDX, EBX, ESP, EBP, ESI, 
EDI. The contents of memory are found at the address provided by the effective address computation. 
Doubleword registers R8D - R15D are available when using REX.R in 64-bit mode. 

• r/ nn64 — A quadword general-purpose register or memory operand used for instructions whose operand-size 
attribute is 64 bits when using REX.W. Quadword general-purpose registers are: RAX, RBX, RCX, RDX, RDI, 
RSI, RBP, RSP, R8-R15; these are available only in 64-bit mode. The contents of memory are found at the 
address provided by the effective address computation. 

• m — A 16-, 32- or 64-bit operand in memory. 

• m8 — A byte operand in memory, usually expressed as a variable or array name, but pointed to by the 
DS:(E)SI or ES:(E)DI registers. In 64-bit mode, it is pointed to by the RSI or RDI registers. 

• ml6 — A word operand in memory, usually expressed as a variable or array name, but pointed to by the 
DS:(E)SI or ES:(E)DI registers. This nomenclature is used only with the string instructions. 

• m32 — A doubleword operand in memory, usually expressed as a variable or array name, but pointed to by the 
DS:(E)SI or ES:(E)DI registers. This nomenclature is used only with the string instructions. 

• m64 — A memory quadword operand in memory. 

• ml28 — A memory double quadword operand in memory. 

• ml6:16, ml6:32 & ml6:64 — A memory operand containing a far pointer composed of two numbers. The 
number to the left of the colon corresponds to the pointer's segment selector. The number to the right 
corresponds to its offset. 

• ml6&32, ml6&16, m32&32, ml6&64 — A memory operand consisting of data item pairs whose sizes are 
indicated on the left and the right side of the ampersand. All memory addressing modes are allowed. The 
ml6&16 and m32&32 operands are used by the BOUND instruction to provide an operand containing an upper 
and lower bounds for array indices. The ml6&32 operand is used by LIDT and LGDT to provide a word with 
which to load the limit field, and a doubleword with which to load the base field of the corresponding GDTR and 
IDTR registers. The ml6&64 operand is used by LIDT and LGDT in 64-bit mode to provide a word with which to 
load the limit field, and a quadword with which to load the base field of the corresponding GDTR and IDTR 
registers. 

• moffs8, moffsl6, moffs32, moffs64 — A simple memory variable (memory offset) of type byte, word, or 
doubleword used by some variants of the MOV instruction. The actual address is given by a simple offset 
relative to the segment base. No ModR/M byte is used in the instruction. The number shown with moffs 
indicates its size, which is determined by the address-size attribute of the instruction. 

• Sreg — A segment register. The segment register bit assignments are ES = 0, CS = 1, SS = 2, DS = 3, FS = 4, 
and GS = 5. 

• m32fp, m64fp, m80fp — A single-precision, double-precision, and double extended-precision (respectively) 
floating-point operand in memory. These symbols designate floating-point values that are used as operands for 
x87 FPU floating-point instructions. 

• ml6int, m32int, m64int — A word, doubleword, and quadword integer (respectively) operand in memory. 
These symbols designate integers that are used as operands for x87 FPU integer instructions. 

• ST or ST{0) — The top element of the FPU register stack. 

• ST(i) — The i’^'^ element from the top of the FPU register stack (i 0 through 7). 

• mm — An MMX register. The 64-bit MMX registers are: MMO through MM7. 

• mm/ m32 — The low order 32 bits of an MMX register or a 32-bit memory operand. The 64-bit MMX registers 
are: MMO through MM7. The contents of memory are found at the address provided by the effective address 
computation. 

• mm/ m64 — An MMX register or a 64-bit memory operand. The 64-bit MMX registers are: MMO through MM7. 
The contents of memory are found at the address provided by the effective address computation. 

• xmm — An XMM register. The 128-bit XMM registers are: XMMO through XMM7; XMM8 through XMM15 are 
available using REX.R in 64-bit mode. 

• xmm/ m32— An XMM register or a 32-bit memory operand. The 128-bit XMM registers are XMMO through 
XMM7; XMM8 through XMM15 are available using REX.R in 64-bit mode. The contents of memory are found at 
the address provided by the effective address computation. 
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• xmm/ m64 — An XMM register or a 64-bit memory operand. The 128-bit SIMD floating-point registers are 
XMMO through XMM7; XMM8 through XMM15 are available using REX.R in 64-bit mode. The contents of 
memory are found at the address provided by the effective address computation. 

• xmm/ ml28 — An XMM register or a 128-bit memory operand. The 128-bit XMM registers are XMMO through 
XMM7; XMM8 through XMM15 are available using REX.R in 64-bit mode. The contents of memory are found at 
the address provided by the effective address computation. 

• <XMMO>— Indicates implied use of the XMMO register. 

When there is ambiguity, xmml indicates the first source operand using an XMM register and xmm2 the second 
source operand using an XMM register. 

Some instructions use the XMMO register as the third source operand, indicated by <XMM0>. The use of the 
third XMM register operand is implicit in the instruction encoding and does not affect the ModR/M encoding. 

• ymm — A VMM register. The 256-bit VMM registers are: YMMO through YMM7; YMM8 through YMM15 are 
available in 64-bit mode. 

• m256 — A 32-byte operand in memory. This nomenclature is used only with AVX instructions. 

• ymm/ m256 — A YMM register or 256-bit memory operand. 

• <YMMO>— Indicates use of the YMMO register as an implicit argument. 

• bnd — A 128-bit bounds register. BNDO through BND3. 

• mib — A memory operand using SIB addressing form, where the index register is not used in address calcu¬ 
lation, Scale is ignored. Only the base and displacement are used in effective address calculation. 

• m512 — A 64-byte operand in memory. 

• zmm/ m512 — A ZMM register or 512-bit memory operand. 

• {kl}{z}-A mask register used as instruction writemask. The 64-bit k registers are: kl through k7. 
Writemask specification is available exclusively via EVEX prefix. The masking can either be done as a merging- 
masking, where the old values are preserved for masked out elements or as a zeroing masking. The type of 
masking is determined by using the EVEX.z bit. 

• {kl} — Without {z}: a mask register used as instruction writemask for instructions that do not allow zeroing- 
masking but support merging-masking. This corresponds to instructions that require the value of the aaa field 
to be different than 0 (e.g., gather) and store-type instructions which allow only merging-masking. 

• kl — A mask register used as a regular operand (either destination or source). The 64-bit k registers are: kO 
through k7. 

• mV — A vector memory operand; the operand size is dependent on the instruction. 

• vm32{x,y, z} — A vector array of memory operands specified using VSIB memory addressing. The array of 

memory addresses are specified using a common base register, a constant scale factor, and a vector index 

register with individual elements of 32-bit index value in an XMM register (vm32x), a YMM register (vm32y) or 
a ZMM register (vm32z). 

• vm64{x,y, z} — A vector array of memory operands specified using VSIB memory addressing. The array of 

memory addresses are specified using a common base register, a constant scale factor, and a vector index 

register with individual elements of 64-bit index value in an XMM register (vm64x), a YMM register (vm64y) or 
a ZMM register (vm64z). 

• zmm/ m512/ m32bcst — An operand that can be a ZMM register, a 512-bit memory location or a 512-bit 
vector loaded from a 32-bit memory location. 

• zmm/ m512/ m64bcst — An operand that can be a ZMM register, a 512-bit memory location or a 512-bit 
vector loaded from a 64-bit memory location. 

• <ZMMO> — Indicates use of the ZMMO register as an implicit argument. 

• {er} — Indicates support for embedded rounding control, which is only applicable to the register-register form 
of the instruction. This also implies support for SAE (Suppress All Exceptions). 

• {sae} — Indicates support for SAE (Suppress All Exceptions). This is used for instructions that support SAE, 
but do not support embedded rounding control. 

• SRCl — Denotes the first source operand in the instruction syntax of an instruction encoded with the 
VEX/EVEX prefix and having two or more source operands. 
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• SRC2 — Denotes the second source operand in the instruction syntax of an instruction encoded with the 
VEX/EVEX prefix and having two or more source operands. 

• SRC3 — Denotes the third source operand in the instruction syntax of an instruction encoded with the 
VEX/EVEX prefix and having three source operands. 

• SRC — The source in a single-source instruction. 

• DST — the destination in an instruction. This field is encoded by reg_field. 

3.1.1.4 Operand Encoding Column in the Instruction Summary Table 

The "operand encoding" column is abbreviated as Op/En in the Instruction Summary table heading. Instruction 
operand encoding information is provided for each assembly instruction syntax using a letter to cross reference to 
a row entry in the operand encoding definition table that follows the instruction summary table. The operand 
encoding table in each instruction reference page lists each instruction operand (according to each instruction 
syntax and operand ordering shown in the instruction column) relative to the ModRM byte, VEX.vvvv field or addi¬ 
tional operand encoding placement. 

EVEX encoded instructions employ compressed disp8*N encoding of the displacement bytes, where N is defined in 
Table 2-34 and Table 2-35, according to tupletypes. The Op/En column of an EVEX encoded instruction uses an 
abbreviation that corresponds to the tupletype abbreviation (and may include an additional abbreviation related to 
ModR/M and vvvv encoding). Most EVEX encoded instructions with VEX encoded equivalent have the ModR/M and 
vvvv encoding order. In such cases, the Tuple abbreviation is shown and the ModR/M, vvvv encoding abbreviation 
may be omitted. 


NOTES 

• The letters in the Op/En column of an instruction apply ONLY to the encoding definition table 
immediately following the instruction summary table. 

• In the encoding definition table, the letter 'r' within a pair of parenthesis denotes the content of 
the operand will be read by the processor. The letter 'w' within a pair of parenthesis denotes the 
content of the operand will be updated by the processor. 

3.1.1.5 64/3Z-bit Mode Column in the Instruction Summary Table 

The "64/32-bit Mode" column indicates whether the opcode sequence is supported in (a) 64-bit mode or (b) the 
Compatibility mode and other IA-32 modes that apply in conjunction with the CPUID feature flag associated 
specific instruction extensions. 

The 64-bit mode support is to the left of the 'slash' and has the following notation: 

• V —Supported. 

• I — Not supported. 

• N.E. — Indicates an instruction syntax is not encodable in 64-bit mode (it may represent part of a sequence of 
valid instructions in other modest 

• N.P. — Indicates the REX prefix does not affect the legacy instruction in 64-bit mode. 

• N.l. — Indicates the opcode is treated as a new instruction in 64-bit mode. 

• N.S. — Indicates an instruction syntax that requires an address override prefix in 64-bit mode and is not 
supported. Using an address override prefix in 64-bit mode may result in model-specific execution behavior. 
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The Compatibility/Legacy Mode support is to the right of the 'slash' and has the following notation: 

• V — Supported. 

• I — Not supported. 

• N.E. — Indicates an Intel 64 instruction mnemonics/syntax that is not encodable; the opcode sequence is not 
applicable as an individual instruction in compatibility mode or IA-32 mode. The opcode may represent a valid 
sequence of legacy IA-32 instructions. 

3.1.1.6 CPUID Support Column in the Instruction Summary Table 

The fourth column holds abbreviated CPUID feature flags (e.g., appropriate bit in CPUID.l.ECX, CPUID.l.EDX 
for SSE/SSE2/SSE3/SSSE3/SSE4.1/SSE4.2/AESNI/PCLMULQDQ/AVX/RDRAND support) that indicate processor 
support for the instruction. If the corresponding flag is 'O', the instruction will #UD. 

3.1.1.7 Description Column in the Instruction Summary Table 

The "Description" column briefly explains forms of the instruction. 

3.1.1.8 Description Section 

Each instruction is then described by number of information sections. The "Description" section describes the 
purpose of the instructions and required operands in more detail. 

Summary of terms that may be used in the description section: 

• Legacy SSE — Refers to SSE, SSE2, SSE3, SSSE3, SSE4, AESNI, PCLMULQDQ and any future instruction sets 
referencing XMM registers and encoded without a VEX prefix. 

• VEX.vvvv — The VEX bit field specifying a source or destination register (in I's complement form). 

• rnn_field — shorthand for the ModR/M r/m field and any REX.B 

• reg_field — shorthand for the ModR/M reg field and any REX.R 

3.1.1.9 Operation Section 

The "Operation" section contains an algorithm description (frequently written in pseudo-code) for the instruction. 
Algorithms are composed of the following elements: 

• Comments are enclosed within the symbol pairs "(*" and "*)". 

• Compound statements are enclosed in keywords, such as: IF, TFIEN, ELSE and FI for an if statement; DO and 
OD for a do statement; or CASE... OF for a case statement. 

• A register name implies the contents of the register. A register name enclosed in brackets implies the contents 
of the location whose address is contained in that register. For example, ES:[DI] indicates the contents of the 
location whose ES segment relative address is in register DI. [SI] indicates the contents of the address 
contained in register SI relative to the SI register's default segment (DS) or the overridden segment. 

• Parentheses around the "E" in a general-purpose register name, such as (E)SI, indicates that the offset is read 
from the SI register if the address-size attribute is 16, from the ESI register if the address-size attribute is 32. 
Parentheses around the "R" in a general-purpose register name, (R)SI, in the presence of a 64-bit register 
definition such as (R)SI, indicates that the offset is read from the 64-bit RSI register if the address-size 
attribute is 64. 

• Brackets are used for memory operands where they mean that the contents of the memory location is a 
segment-relative offset. For example, [SRC] indicates that the content of the source operand is a segment- 
relative offset. 

• A ^ B indicates that the value of B is assigned to A. 

• The symbols =, >, <, >, and < are relational operators used to compare two values: meaning equal, not 

equal, greater or equal, less or equal, respectively. A relational expression such as A = B is TRUE if the value of 
A is equal to B; otherwise it is FALSE. 
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• The expression "« COUNT" and "» COUNT" indicates that the destination operand should be shifted left or right 
by the number of bits indicated by the count operand. 

The following identifiers are used in the algorithmic descriptions: 

• OperandSize and AddressSize — The OperandSize identifier represents the operand-size attribute of the 
instruction, which is 16, 32 or 64-bits. The AddressSize identifier represents the address-size attribute, which 
is 16, 32 or 64-bits. For example, the following pseudo-code indicates that the operand-size attribute depends 
on the form of the MOV instruction used. 

IF Instruction = MOVW 

THEN OperandSize ^ 16; 

ELSE 

IF Instruction = MOVD 

THEN OperandSize ^ 32; 

ELSE 

IF Instruction = MOVQ 

THEN OperandSize 64; 

FI; 

FI; 

FI; 

See "Operand-Size and Address-Size Attributes" in Chapter 3 of the Intel® 64 and IA-32 Architectures 
Software Developer's Manual, Volume 1, for guidelines on how these attributes are determined. 

• StackAddrSize — Represents the stack address-size attribute associated with the instruction, which has a 
value of 16, 32 or 64-bits. See "Address-Size Attribute for Stack" in Chapter 6, "Procedure Calls, Interrupts, and 
Exceptions," of the Intel® 64 and IA-32 Architectures Software Developer's Manual, Volume 1. 

• SRC — Represents the source operand. 

• DEST — Represents the destination operand. 

• VLMAX — The maximum vector register width pertaining to the instruction. This is not the vector-length 
encoding in the instruction's prefix but is instead determined by the current value of XCRO. For existing 
processors, VLMAX is 256 whenever XCRO.YMM[bit 2] is 1. Future processors may defined new bits in XCRO 
whose setting may imply other values for VLMAX. 


VLMAX Definition 


XCRO Component 

VLMAX 

XCRO.YMM 

256 


The following functions are used in the algorithmic descriptions: 

• ZeroExtend(value) — Returns a value zero-extended to the operand-size attribute of the instruction. For 
example, if the operand-size attribute is 32, zero extending a byte value of -10 converts the byte from F6H to 
a doubleword value of 000000F6H. If the value passed to the ZeroExtend function and the operand-size 
attribute are the same size, ZeroExtend returns the value unaltered. 

• SignExtend(value) — Returns a value sign-extended to the operand-size attribute of the instruction. For 
example, if the operand-size attribute is 32, sign extending a byte containing the value -10 converts the byte 
from F6H to a doubleword value of FFFFFFF6H. If the value passed to the SignExtend function and the operand- 
size attribute are the same size, SignExtend returns the value unaltered. 

• SaturateSignedWordToSignedByte — Converts a signed 16-bit value to a signed 8-bit value. If the signed 
16-bit value is less than -128, it is represented by the saturated value -128 (80H); if it is greater than 127, it 
is represented by the saturated value 127 (7FH). 

• SaturateSignedDwordToSignedWord — Converts a signed 32-bit value to a signed 16-bit value. If the 
signed 32-bit value is less than -32768, it is represented by the saturated value -32768 (8000H); if it is greater 
than 32767, it is represented by the saturated value 32767 (7FFFH). 
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• SaturateSignedWordToUnsignedByte — Converts a signed 16-bit value to an unsigned 8-bit value. If the 
signed 16-bit value is less than zero, it is represented by the saturated value zero (OOH); if it is greater than 
255, it is represented by the saturated value 255 (FFH). 

• SaturateToSignedByte — Represents the result of an operation as a signed 8-bit value. If the result is less 
than -128, it is represented by the saturated value -128 (80FI); if it is greater than 127, it is represented by 
the saturated value 127 (7FFI). 

• SaturateToSignedWord — Represents the result of an operation as a signed 16-bit value. If the result is less 
than -32768, it is represented by the saturated value -32768 (8000FI); if it is greater than 32767, it is 
represented by the saturated value 32767 (7FFFFI). 

• SaturateToUnsignedByte — Represents the result of an operation as a signed 8-bit value. If the result is less 
than zero it is represented by the saturated value zero (OOFI); if it is greater than 255, it is represented by the 
saturated value 255 (FFFI). 

• SaturateToUnsignedWord — Represents the result of an operation as a signed 16-bit value. If the result is 
less than zero it is represented by the saturated value zero (OOH); if it is greater than 65535, it is represented 
by the saturated value 65535 (FFFFH). 

• LowOrderWord(DESTSRC) — Multiplies a word operand by a word operand and stores the least significant 
word of the doubleword result in the destination operand. 

• HighOrderWord(DEST * SRC) — Multiplies a word operand by a word operand and stores the most 
significant word of the doubleword result in the destination operand. 

• Push(value) — Pushes a value onto the stack. The number of bytes pushed is determined by the operand-size 
attribute of the instruction. See the "Operation" subsection of the "PUSH—Push Word, Doubleword or 
Quadword Onto the Stack" section in Chapter4 of the Intel® 64 and IA-32 Architectures Software Developer's 
Manual, Volume 2B. 

• PopO — removes the value from the top of the stack and returns it. The statement EAX <- Pop(); assigns to 
EAX the 32-bit value from the top of the stack. Pop will return either a word, a doubleword or a quadword 
depending on the operand-size attribute. See the "Operation" subsection in the "POP—Pop a Value from the 
Stack" section of Chapter 4 of the I ntel® 64 and IA-32 Architectures Software Developer's Manual, Volume 2B. 

• PopRegisterStack — Marks the FPU ST(0) register as empty and increments the FPU register stack pointer 
(TOP) by 1. 

• Switch-Tasks — Performs a task switch. 

• Bit(BitBase, BitOffset) — Returns the value of a bit within a bit string. The bit string is a sequence of bits in 
memory or a register. Bits are numbered from low-order to high-order within registers and within memory 
bytes. If the BitBase is a register, the BitOffset can be in the range 0 to [15, 31, 63] depending on the mode 
and register size. See Figure 3-1: the function Bit[RAX, 21] is illustrated. 


63 31 21 0 





t 

1-Bit Off set ^21 


Figure 3-1. Bit Offset for BIT[RAX, 21] 
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If BitBase is a memory address, the BitOffset can range has different ranges depending on the operand size 
(see Table 3-2). 


Table 3-2. Range of Bit Positions S 

pecified by Bit Offset Operands 

Operand Size 

Immediate BitOffset 

Register BitOffset 

16 

Oto 15 


32 

0to31 

_231 to23i_i 

64 

Oto 63 

- 2^3 to 263 _ -| 


The addressed bit is numbered (Offset MOD 8) within the byte at address (BitBase -i- (BitOffset DIV 8)) where 
DIV is signed division with rounding towards negative infinity and MOD returns a positive number (see 
Figure 3-2). 


7 5 0 7 0 7 0 








BitBase + 

BitBase 


BitBase - 


BitOffset <-+13 ^ 


7 0 7 0 7 5 0 








BitBase 

BitBase - BitBase - 


BitOffset 



Figure 3-2. Memory Bit Indexing 


3.1.1.10 Intel® C/C-I-+ Compiler Intrinsics Equivalents Section 

The Intel C/C++ compiler intrinsic functions give access to the full power of the Intel Architecture Instruction Set, 
while allowing the compiler to optimize register allocation and instruction scheduling for faster execution. Most of 
these functions are associated with a single lA instruction, although some may generate multiple instructions or 
different instructions depending upon how they are used. In particular, these functions are used to invoke instruc¬ 
tions that perform operations on vector registers that can hold multiple data elements. These SIMD instructions 
use the following data types. 

• _ml28,_m256 and_m512 can represent 4, 8 or 16 packed single-precision floating-point values, and are 

used with the vector registers and SSE, AVX, or AVX-512 instruction set extension families. The_ml28 data 

type is also used with various single-precision floating-point scalar instructions that perform calculations using 
only the lowest 32 bits of a vector register; the remaining bits of the result come from one of the sources or are 
set to zero depending upon the instruction. 

• _ml28d,_m256d and_m512d can represent 2, 4 or 8 packed double-precision floating-point values, and 

are used with the vector registers and SSE, AVX, or AVX-512 instruction set extension families. The_ml28d 

data type is also used with various double-precision floating-point scalar instructions that perform calculations 
using only the lowest 64 bits of a vector register; the remaining bits of the result come from one of the sources 
or are set to zero depending upon the instruction. 

• _ml28i,_m256i and_m512i can represent integer data in bytes, words, doublewords, quadwords, and 

occasionally larger data types. 
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Each of these data types incorporates in its name the number of bits it can hold. For example, the_ml28 type 

holds 128 bits, and because each single-precision floating-point value is 32 bits long the_ml28 type holds 

(128/32) or four values. Normally the compiler will allocate memory for these data types on an even multiple of the 
size of the type. Such aligned memory locations may be faster to read and write than locations at other addresses. 

These SIMD data types are not basic Standard C data types or C-h- objects, so they may be used only with the 
assignment operator, passed as function arguments, and returned from a function call. If you access the internal 
members of these types directly, or indirectly by using them in a union, there may be side effects affecting optimi¬ 
zation, so it is recommended to use them only with the SIMD instruction intrinsic functions described in this manual 
or the Intel C/C++ compiler documentation. 

Many intrinsic functions names are prefixed with an indicator of the vector length and suffixed by an indicator of 
the vector element data type, although some functions do not follow the rules below. The prefixes are: 

• _mm_ indicates that the function operates on 128-bit (or sometimes 64-bit) vectors. 

• _mm256_ indicates the function operates on 256-bit vectors. 

• _mm512_ indicates that the function operates on 512-bit vectors. 

The suffixes include: 

• _ps, which indicates a function that operates on packed single-precision floating-point data. Packed single¬ 
precision floating-point data corresponds to arrays of the C/C++ type float with either 4, 8 or 16 elements. 
Values of this type can be loaded from an array using the _mm_loadu_ps, _mm256_loadu_ps, or 
_mm512_loadu_ps functions, or created from individual values using _mm_set_ps, _mm256_set_ps, or 
_mm512_set_ps functions, and they can be stored in an array using _mm_storeu_ps, _mm256_storeu_ps, or 
_mm512_storeu_ps. 

• _ss, which indicates a function that operates on scalar single-precision floating-point data. Single-precision 
floating-point data corresponds to the C/C++ type float, and values of type float can be converted to type 

_ml28 for use with these functions using the _mm_set_ss function, and converted back using the 

_mm_cvtss_f32 function. When used with functions that operate on packed single-precision floating-point data 
the scalar element corresponds with the first packed value. 

• _pd, which indicates a function that operates on packed double-precision floating-point data. Packed double¬ 
precision floating-point data corresponds to arrays of the C/C++ type double with either 2, 4, or 8 elements. 
Values of this type can be loaded from an array using the _mm_loadu_pd, _mm256_loadu_pd, or 
_mm512_loadu_pd functions, or created from individual values using _mm_set_pd, _mm2566_set_pd, or 
_mm512_set_pd functions, and they can be stored in an array using _mm_storeu_pd, _mm256_storeu_pd, or 
_mm512_storeu_pd. 

• _sd, which indicates a function that operates on scalar double-precision floating-point data. Double-precision 
floating-point data corresponds to the C/C++ type double, and values of type double can be converted to type 

_ml28d for use with these functions using the _mm_set_sd function, and converted back using the 

_mm_cvtsd_f64 function. When used with functions that operate on packed double-precision floating-point 
data the scalar element corresponds with the first packed value. 

• _epi8, which indicates a function that operates on packed 8-bit signed integer values. Packed 8-bit signed 
integers correspond to an array of signed char with 16, 32 or 64 elements. Values of this type can be created 
from individual elements using _mm_set_epi8, _mm256_set_epi8, or _mm512_set_epi8 functions. 

• _epil6, which indicates a function that operates on packed 16-bit signed integer values. Packed 16-bit signed 
integers correspond to an array of short with 8, 16 or 32 elements. Values of this type can be created from 
individual elements using _mm_set_epil6, _mm256_set_epil6, or _mm512_set_epil6 functions. 

• _epi32, which indicates a function that operates on packed 32-bit signed integer values. Packed 32-bit signed 
integers correspond to an array of int with 4, 8 or 16 elements. Values of this type can be created from 
individual elements using _mm_set_epi32, _mm256_set_epi32, or _mm512_set_epi32 functions. 

• _epi64, which indicates a function that operates on packed 64-bit signed integer values. Packed 64-bit signed 
integers correspond to an array of long long (or long if it is a 64-bit data type) with 2, 4 or 8 elements. Values 
of this type can be created from individual elements using _mm_set_epi32, _mm256_set_epi32, or 
_mm512_set_epi32 functions. 

• _epu8, which indicates a function that operates on packed 8-bit unsigned integer values. Packed 8-bit unsigned 
integers correspond to an array of unsigned char with 16, 32 or 64 elements. 
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• _epul6, which indicates a function that operates on packed 16-bit unsigned integer values. Packed 16-bit 
unsigned integers correspond to an array of unsigned short with 8, 16 or 32 elements. 

• _epu32, which indicates a function that operates on packed 32-bit unsigned integer values. Packed 32-bit 
unsigned integers correspond to an array of unsigned with 4, 8 or 16 elements. 

• _epu64, which indicates a function that operates on packed 64-bit unsigned integer values. Packed 64-bit 
unsigned integers correspond to an array of unsigned long long (or unsigned long if it is a 64-bit data type) with 
2, 4 or 8 elements. 

• _sil28, which indicates a function that operates on a single 128-bit value of type_ml28i. 

• _si256, which indicates a function that operates on a single a 256-bit value of type_m256i. 

• _si512, which indicates a function that operates on a single a 512-bit value of type_m512i. 

Values of any packed integer type can be loaded from an array using the _mm_loadu_sil28, 
_mm256_loadu_si256, or _mm512_loadu_si512 functions, and they can be stored in an array using 
_mm_storeu_sil28, _mm256_storeu_si256, or_mm512_storeu_si512. 

These functions and data types are used with the SSE, AVX, and AVX-512 instruction set extension families. In 
addition there are similar functions that correspond to MMX instructions. These are less frequently used because 
they require additional state management, and only operate on 64-bit packed integer values. 

The declarations of Intel C/C++ compiler intrinsic functions may reference some non-standard data types, such as 

_int64. The C Standard header stdint.h defines similar platform-independent types, and the documentation for 

that header gives characteristics that apply to corresponding non-standard types according to the following table. 


Table 3-3. Standard and Non-standard Data Types 


Non-Standard Type 

Standard Type (from stdint.h) 

_int64 

int64_t 

unsigned_int64 

uint64_t 

_int32 

int32_t 

unsigned_int32 

uint32_t 

_inti 6 

inti 6_t 

unsigned_inti 6 

uinti 6_t 


For a more detailed description of each intrinsic function and additional information related to its usage, refer to the 
online Intel Intrinsics Guide, https://software.intel.com/sites/landinaDaae/IntrinsicsGuide . 

3.1.1.11 Flags Affected Section 

The "Flags Affected" section lists the flags in the EFLAGS register that are affected by the instruction. When a flag 
is cleared, it is equal to 0; when it is set, it is equal to 1. The arithmetic and logical instructions usually assign 
values to the status flags in a uniform manner (see Appendix A, "EFLAGS Cross-Reference," in the Intel® 64 and 
IA-32 Architectures Software Developer's Manual, Volume 1). Non-conventional assignments are described in the 
"Operation" section. The values of flags listed as undefined may be changed by the instruction in an indeterminate 
manner. Flags that are not listed are unchanged by the instruction. 

3.1.1.1Z FPU Flags Affected Section 

The floating-point instructions have an "FPU Flags Affected" section that describes how each instruction can affect 
the four condition code flags of the FPU status word. 

3.1.1.13 Protected Mode Exceptions Section 

The "Protected Mode Exceptions" section lists the exceptions that can occur when the instruction is executed in 
protected mode and the reasons for the exceptions. Each exception is given a mnemonic that consists of a pound 
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sign (#) followed by two letters and an optional error code in parentheses. For example, #GP(0) denotes a general 
protection exception with an error code of 0. Table 3-4 associates each two-letter mnemonic with the corre¬ 
sponding exception vector and name. See Chapter 6, "Procedure Calls, Interrupts, and Exceptions," in the Intel® 
64 and IA-32 Architectures Software Developer's Manual, Volume 3A, fora detailed description of the exceptions. 

Application programmers should consult the documentation provided with their operating systems to determine 
the actions taken when exceptions occur. 


Table 3-4. Intel 64 and IA-32 General Exceptions 


Vector 

Name 

Source 

Protected 

Mode^ 

Real 

Address 

Mode 

Virtual 

8086 

Mode 

0 

#DE—Divide Error 

DIV and IDIV instructions. 

Yes 

Yes 

Yes 

1 

#DB—Debug 

Any code or data reference. 

Yes 

Yes 

Yes 

3 

#BP—Breakpoint 

INT 3 instruction. 

Yes 

Yes 

Yes 

4 

#0F—Overflow 

INTO instruction. 

Yes 

Yes 

Yes 

5 

#BR—BOUND Range Exceeded 

BOUND instruction. 

Yes 

Yes 

Yes 

6 

#UD—Invalid Opcode (Undefined 
Opcode) 

UD2 instruction or reserved opcode. 

Yes 

Yes 

Yes 

7 

#NM—Device Not Available (No 
Math Coprocessor) 

Floating-point or WAIT/FWAIT instruction. 

Yes 

Yes 

Yes 

8 

#DF-Double Fault 

Any instruction that can generate an 
exception, an NMI, or an INTR. 

Yes 

Yes 

Yes 

10 

#TS-lnvalid TSS 

Task switch or TSS access. 

Yes 

Reserved 

Yes 

11 

#NP—Segment Not Present 

Loading segment registers or accessing system 
segments. 

Yes 

Reserved 

Yes 

12 

#SS—Stack Segment Fault 

Stack operations and SS register loads. 

Yes 

Yes 

Yes 

13 

#CP—General Protection^ 

Any memory reference and other protection 
checks. 

Yes 

Yes 

Yes 

14 

#PF—Page Fault 

Any memory reference. 

Yes 

Reserved 

Yes 

16 

#MF—Floating-Point Error (Math 
Fault) 

Floating-point or WAIT/FWAIT instruction. 

Yes 

Yes 

Yes 

17 

#AC—Alignment Check 

Any data reference in memory. 

Yes 

Reserved 

Yes 

18 

#MC—Machine Check 

Model dependent machine check errors. 

Yes 

Yes 

Yes 

19 

#XM-SIMD Floating-Point 

Numeric Error 

SSE/SSE2/SSE3 floating-point instructions. 

Yes 

Yes 

Yes 


NOTES: 

1. Apply to protected mode, compatibility mode, and 64-bit mode. 

2. In the real-address mode, vector 13 is the segment overrun exception. 


3.1.1.14 Real-Address Mode Exceptions Section 

The "Real-Address Mode Exceptions" section lists the exceptions that can occur when the instruction is executed in 
real-address mode (see Table 3-4). 

3.1.1.15 Virtual-SOSG Mode Exceptions Section 

The "Virtual-8086 Mode Exceptions" section lists the exceptions that can occur when the instruction is executed in 
virtual-8086 mode (see Table 3-4). 
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3.1.1.16 Floating-Point Exceptions Section 

The "Floating-Point Exceptions" section lists exceptions that can occur when an x87 FPU floating-point instruction 
is executed. All of these exception conditions result in a floating-point error exception (#MF, exception 16) being 
generated. Table 3-5 associates a one- or two-letter mnemonic with the corresponding exception name. See 
"Floating-Point Exception Conditions" in Chapter 8 of the Inter 64 and IA-32 Architectures Software Developer's Manual, 
Volume 1, for a detailed description of these exceptions. 


Table 3-5. x87 FPU Floating-Point Exceptions 


Mnemonic 

Name 

Source 

#IS 

#IA 

Floating-point invalid operation: 

- Stack overflow or underflow 

- Invalid arithmetic operation 

- x87 FPU stack overflow or underflow 

- Invalid FPU arithmetic operation 

#Z 

Floating-point divide-by-zero 

Divide-by-zero 

#D 

Floating-point denormal operand 

Source operand that is a denormal number 

#o 

Floating-point numeric overflow 

Overflow in result 

#U 

Floating-point numeric underflow 

Underflow In result 

#P 

Floating-point inexact result (precision) 

Inexact result (precision) 


3.1.1.17 SIMD Floating-Point Exceptions Section 

The "SIMD Floating-Point Exceptions" section lists exceptions that can occur when an SSE/SSE2/SSE3 floating¬ 
point instruction is executed. All of these exception conditions result in a SIMD floating-point error exception (#XM, 
exception 19) being generated. Table 3-6 associates a one-letter mnemonic with the corresponding exception 
name. For a detailed description of these exceptions, refer to "SSE and SSE2 Exceptions", in Chapter 11 of the 
Intel® 64 and IA-32 Architectures Software Developer's Manual, Volume 1. 


Table 3-6. SIMD Floating-Point Exceptions 


Mnemonic 

Name 

Source 

#1 

Floating-point invalid operation 

Invalid arithmetic operation or source operand 

#Z 

Floating-point divide-by-zero 

Divide-by-zero 

#D 

Floating-point denormal operand 

Source operand that is a denormal number 

#0 

Floating-point numeric overflow 

Overflow in result 

#U 

Floating-point numeric underflow 

Underflow in result 

#P 

Floating-point inexact result 

Inexact result (precision) 


3.1.1.18 Compatibilitv Mode Exceptions Section 

This section lists exceptions that occur within compatibility mode. 

3.1.1.19 64-Bit Mode Exceptions Section 

This section lists exceptions that occur within 64-bit mode. 
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3.2 INSTRUCTIONS (A-L) 

The remainder of this chapter provides descriptions of Intel 64 and IA-32 instructions (A-L). See also: Chapter 4, 
"Instruction Set Reference, M-U," in the Intel® 64 and IA-32 Architectures Software Developer's Manual, Volume 
2B, and Chapter 5, "Instruction Set Reference, V-Z," in the I ntel® 64 and IA-32 Architectures Software Developer's 
Manual, Volume 2C. 
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AAA—ASCII Adjust After Addition 


Opcode 

Instruction 

Op/ 

En 

64-bit 

Mode 

Compat/ 
Leg Mode 

Description 

37 

AAA 

NP 

Invalid 

Valid 

ASCII adjust AL after addition. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

NP 

NA 

NA 

NA 

NA 


Description 

Adjusts the sum of two unpacked BCD values to create an unpacked BCD result. The AL register is the implied 
source and destination operand for this instruction. The AAA instruction is only useful when it follows an ADD 
instruction that adds (binary addition) two unpacked BCD values and stores a byte result in the AL register. The 
AAA instruction then adjusts the contents of the AL register to contain the correct 1-digit unpacked BCD result. 

If the addition produces a decimal carry, the AH register increments by 1, and the CF and AF flags are set. If there 
was no decimal carry, the CF and AF flags are cleared and the AH register is unchanged. In either case, bits 4 
through 7 of the AL register are set to 0. 

This instruction executes as described in compatibility mode and legacy mode. It is not valid in 64-bit mode. 

Operation 

IF 64-Blt Mode 
THEN 
#UD; 

ELSE 

IF ((AL AND OFH) > 9) or (AF = 1) 

THEN 

AX ^ AX + 10SH; 

AF^ 1; 

CF^ 1; 

ELSE 

AF ^ 0; 

CF^O; 

FI; 

AL ^ AL AND OFH; 

FI; 

Flags Affected 

The AF and CF flags are set to 1 if the adjustment results in a decimal carry; otherwise they are set to 0. The OF, 
SF, ZF, and PF flags are undefined. 

Protected Mode Exceptions 

#UD If the LOCK prefix is used. 

Real-Address Mode Exceptions 

Same exceptions as protected mode. 

Virtual-SOSe Mode Exceptions 

Same exceptions as protected mode. 


3-18 Vol. 2A 


AAA—ASCII Adjust After Addition 















INSTRUCTION SET REFERENCE, A-L 


Compatibility Mode Exceptions 

Same exceptions as protected mode. 

64-Bit Mode Exceptions 

#UD If in 64-bit mode. 


AAA—ASCII Adjust After Addition 
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AAD—ASCII Adjust AX Before Division 


Opcode 

Instruction 

Op/ 

En 

64-bit 

Mode 

Compat/ 
Leg Mode 

Description 

D5 0A 

AAD 

NP 

Invalid 

Valid 

ASCII adjust AX before division. 

D5 ib 

AAD ImmS 

NP 

Invalid 

Valid 

Adjust AX before division to number base 
immS. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

NP 

NA 

NA 

NA 

NA 


Description 

Adjusts two unpacked BCD digits (the least-significant digit in the AL register and the most-significant digit in the 
AH register) so that a division operation performed on the result will yield a correct unpacked BCD value. The AAD 
instruction is only useful when it precedes a DIV instruction that divides (binary division) the adjusted value in the 
AX register by an unpacked BCD value. 

The AAD instruction sets the value in the AL register to (AL + (10 * AH)), and then clears the AH register to OOH. 
The value in the AX register is then equal to the binary equivalent of the original unpacked two-digit (base 10) 
number in registers AH and AL. 

The generalized version of this instruction allows adjustment of two unpacked digits of any number base (see the 
"Operation" section below), by setting the immS byte to the selected number base (for example, OSH for octal, OAH 
for decimal, or OCH for base 12 numbers). The AAD mnemonic is interpreted by all assemblers to mean adjust 
ASCII (base 10) values. To adjust values in another number base, the instruction must be hand coded in machine 
code (D5 immS). 

This instruction executes as described in compatibility mode and legacy mode. It is not valid in 64-bit mode. 

Operation 

IF 64-Blt Mode 
THEN 
#UD; 

ELSE 

tempAL AL; 
tempAH AH; 

AL (tempAL + (tempAH * immS)) AND FFH; 

(* immS Is set to OAH for the AAD mnemonic.*) 

AH^O; 

FI; 

The immediate value (immS) is taken from the second byte of the instruction. 

Flags Affected 

The SF, ZF, and PF flags are set according to the resulting binary value in the AL register; the OF, AF, and CF flags 
are undefined. 

Protected Mode Exceptions 

#UD If the LOCK prefix is used. 

Real-Address Mode Exceptions 

Same exceptions as protected mode. 
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\/irtual-8086 Mode Exceptions 

Same exceptions as protected mode. 

Compatibility Mode Exceptions 

Same exceptions as protected mode. 

64-Bit Mode Exceptions 

#UD If in 64-bit mode. 


AAD—ASCII Adjust AX Before Division 
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AAM-ASCII Adjust AX After Multiply 


Opcode 

Instruction 

Op/ 

En 

64-bit 

Mode 

Compat/ 
Leg Mode 

Description 

D4 OA 

AAM 

NP 

Invalid 

Valid 

ASCII adjust AX after multiply. 

D4/5 

AAM ImmS 

NP 

Invalid 

Valid 

Adjust AX after multiply to number base 
imm8. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

NP 

NA 

NA 

NA 

NA 


Description 

Adjusts the result of the multiplication of two unpacked BCD values to create a pair of unpacked (base 10) BCD 
values. The AX register is the implied source and destination operand for this instruction. The AAM instruction is 
only useful when it follows an MUL instruction that multiplies (binary multiplication) two unpacked BCD values and 
stores a word result in the AX register. The AAM instruction then adjusts the contents of the AX register to contain 
the correct 2-digit unpacked (base 10) BCD result. 

The generalized version of this instruction allows adjustment of the contents of the AX to create two unpacked 
digits of any number base (see the "Operation" section below). Here, the immS byte is set to the selected number 
base (for example, OSH for octal, OAH for decimal, or OCH for base 12 numbers). The AAM mnemonic is interpreted 
by all assemblers to mean adjust to ASCII (base 10) values. To adjust to values in another number base, the 
instruction must be hand coded in machine code (D4 immS). 

This instruction executes as described in compatibility mode and legacy mode. It is not valid in 64-bit mode. 

Operation 

IF 64-Blt Mode 
THEN 
#UD; 

ELSE 

tempAL AL; 

AH tempAL / imm&, (* /mmS is set to OAH for the AAM mnemonic *) 

AL tempAL MOD /mmS; 

FI; 

The immediate value (immS) is taken from the second byte of the instruction. 

Flags Affected 

The SF, ZF, and PF flags are set according to the resulting binary value in the AL register. The OF, AF, and CF flags 
are undefined. 

Protected Mode Exceptions 

#DE If an immediate value of 0 is used. 

#UD If the LOCK prefix is used. 

Real-Address Mode Exceptions 

Same exceptions as protected mode. 

Virtual-SOSe Mode Exceptions 

Same exceptions as protected mode. 
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Compatibility Mode Exceptions 

Same exceptions as protected mode. 

64-Bit Mode Exceptions 

#UD If in 64-bit mode. 


AAM-ASCII Adjust AX After Multiply 
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AAS—ASCII Adjust AL After Subtraction 


Opcode 

Instruction 

Op/ 

En 

64-bit 

Mode 

Compat/ 
Leg Mode 

Description 

3F 

AAS 

NP 

Invalid 

Valid 

ASCII adjust AL after subtraction. 


Instruction Operand 

Encoding 

Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

NP 

NA 

NA 

NA 

NA 


Description 

Adjusts the result of the subtraction of two unpacked BCD values to create a unpacked BCD result. The AL register 
is the implied source and destination operand for this instruction. The AAS instruction is only useful when it follows 
a SUB instruction that subtracts (binary subtraction) one unpacked BCD value from another and stores a byte 
result in the AL register. The AAA instruction then adjusts the contents of the AL register to contain the correct 1- 
digit unpacked BCD result. 

If the subtraction produced a decimal carry, the AH register decrements by 1, and the CF and AF flags are set. If no 
decimal carry occurred, the CF and AF flags are cleared, and the AH register is unchanged. In either case, the AL 
register is left with its top four bits set to 0. 

This instruction executes as described in compatibility mode and legacy mode. It is not valid in 64-bit mode. 

Operation 

IF 64-blt mode 
THEN 
#UD; 

ELSE 

IF ((AL AND OFH) > 9) or (AF = 1) 

THEN 

AX ^ AX - 6; 

AH ^ AH - 1; 

AF^ 1; 

CF^ 1; 

AL ^ AL AND OFH; 

ELSE 

CF^O; 

AF ^ 0; 

AL ^ AL AND OFH; 

FI; 

FI; 

Flags Affected 

The AF and CF flags are set to 1 if there is a decimal borrow; otherwise, they are cleared to 0. The OF, SF, ZF, and 
PF flags are undefined. 

Protected Mode Exceptions 

#UD If the LOCK prefix is used. 

Real-Address Mode Exceptions 

Same exceptions as protected mode. 
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\/irtual-8086 Mode Exceptions 

Same exceptions as protected mode. 

Compatibility Mode Exceptions 

Same exceptions as protected mode. 

64-Bit Mode Exceptions 

#UD If in 64-bit mode. 


AAS—ASCII Adjust AL After Subtraction 
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ADC—Add with Carry 


Opcode 

Instruction 

Op/ 

Gn 

64-bit 

Mode 

Compat/ 
Leg Mode 

Description 

14/b 

ADC AL, imm8 

1 

Valid 

Valid 

Add with carry imm8 to AL. 

15 iw 

ADC AX,immi6 

1 

Valid 

Valid 

Add with carry imm 16 to AX. 

15/d 

ADC EAX, imm32 

1 

Valid 

Valid 

Add with carry imm32 to EAX. 

REX.W + 15 /d 

ADC RAX, imm32 

1 

Valid 

N.E. 

Add with carry imm32 sign extended to 64- 
bits to RAX. 

80 /2 ib 

ADC r/mS, imm8 

Ml 

Valid 

Valid 

Add with carry imm8 to r/m8. 

REX + 80 /2 ib 

ADC r/mS\ imm8 

Ml 

Valid 

N.E. 

Add with carry imm8 to r/mS. 

81 /2 iw 

ADC r/m 16, imm 16 

Ml 

Valid 

Valid 

Add with carry imm 16 to r/m 7 6. 

81 /2 id 

ADC r/m32, imm32 

Ml 

Valid 

Valid 

Add with CF imm32 to r/m32. 

REX.W + 81/2 id 

ADC r/m64, imm32 

Ml 

Valid 

N.E. 

Add with CF imm32 sign extended to 64-bits 
to r/m64. 

83 /2 ib 

ADC r/m 16, imm8 

Ml 

Valid 

Valid 

Add with CF sign-extended imm8 to r/m 16. 

83 /2 ib 

ADC r/m32, imm8 

Ml 

Valid 

Valid 

Add with CF sign-extended imm8 into r/m32. 

REX.W + 83 /2 ib 

ADC r/m64, imm8 

Ml 

Valid 

N.E. 

Add with CF sign-extended imm8 into r/m64. 

10/r 

ADC r/mS, rS 

MR 

Valid 

Valid 

Add with carry byte register to r/mS. 

REX +10 k 

ADC r/mS, rS* 

MR 

Valid 

N.E. 

Add with carry byte register to r/m64. 

11 /r 

ADC r/m 76, r76 

MR 

Valid 

Valid 

Add with carry r76 to r/m 16. 

11 /r 

ADC r/m32, r32 

MR 

Valid 

Valid 

Add with CF r32 to r/m32. 

REX.W + 11 /r 

ADC r/m64, r64 

MR 

Valid 

N.E. 

Add with CF r64 to r/m64. 

12/r 

ADC rS, r/mS 

RM 

Valid 

Valid 

Add with carry r/mS to byte register. 

REX + 12 /r 

ADC r8, r/m8 

RM 

Valid 

N.E. 

Add with carry r/m64 to byte register. 

13/r 

ADC r16, r/m 16 

RM 

Valid 

Valid 

Add with carry r/m 16 to r16. 

13/r 

ADC r32, r/m32 

RM 

Valid 

Valid 

Add with CF r/m32 to r32. 

REX.W + 13 /r 

ADC r64, r/m64 

RM 

Valid 

N.E. 

Add with CF r/m64 to r64. 


NOTES: 

*ln 64-bit mode, r/mS can not be encoded to access the following byte registers If a REX prefix is used: AH, BH, CH, DH. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RM 

ModRM:reg (r, w) 

ModRM:r/m (r) 

NA 

NA 

MR 

ModRM:r/m (r, w) 

ModRM:reg (r) 

NA 

NA 

Ml 

ModRM:r/m (r, w) 

Imm8 

NA 

NA 

1 

AL/AX/EAX/RAX 

Imm8 

NA 

NA 


Description 

Adds the destination operand (first operand), the source operand (second operand), and the carry (CF) flag and 
stores the result in the destination operand. The destination operand can be a register or a memory location; the 
source operand can be an immediate, a register, or a memory location. (However, two memory operands cannot be 
used in one instruction.) The state of the CF flag represents a carry from a previous addition. When an immediate 
value is used as an operand, it is sign-extended to the length of the destination operand format. 
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The ADC instruction does not distinguish between signed or unsigned operands. Instead, the processor evaluates 
the result for both data types and sets the OF and CF flags to indicate a carry in the signed or unsigned result, 
respectively. The SF flag indicates the sign of the signed result. 

The ADC instruction is usually executed as part of a multibyte or multiword addition in which an ADD instruction is 
followed by an ADC instruction. 

This instruction can be used with a LOCK prefix to allow the instruction to be executed atomically. 

In 64-bit mode, the instruction's default operation size is 32 bits. Using a REX prefix in the form of REX.R permits 
access to additional registers (R8-R15). Using a REX prefix in the form of REX.W promotes operation to 64 bits. See 
the summary chart at the beginning of this section for encoding data and limits. 

Operation 

DEST ^ DEST + SRC + CF; 


Intel C/C++ Compiler Intrinsic Equivalent 

ADC: extern unsigned char _addcarry_u8(unsigned char cjn, unsigned char srcl, unsigned char src2, unsigned char *sum_out); 

ADC: extern unsigned char _addcarry_u16(unsigned char cJn, unsigned short srcl, unsigned short src2, unsigned short 

*sum_out); 

ADC: extern unsigned char _addcarry_u32(unsigned char cjn, unsigned int srcl, unsigned char int, unsigned int *sum_out); 

ADC: extern unsigned char _addcarry_u64(unsigned char cjn, unsigned int64 srcl, unsigned int64 src2, unsigned int64 

*sum_out); 

Flags Affected 

The OF, SF, ZF, AF, CF, and PF flags are set according to the result. 


Protected Mode Exceptions 

#GP(0) If the destination is located in a non-writable segment. 

If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 
If the DS, ES, FS, or GS register is used to access memory and it contains a NULL segment 
selector. 

#SS(0) If a memory operand effective address is outside the SS segment limit. 

#PF(fault-code) If a page fault occurs. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the 

current privilege level is 3. 

#UD If the LOCK prefix is used but the destination is not a memory operand. 

Real-Address Mode Exceptions 

#GP If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

#SS If a memory operand effective address is outside the SS segment limit. 

#UD If the LOCK prefix is used but the destination is not a memory operand. 


\/irtual-8086 Mode Exceptions 


#GP(0) 

#SS(0) 

#PF(fault-code) 

#AC(0) 

#UD 


If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 
If a memory operand effective address is outside the SS segment limit. 

If a page fault occurs. 

If alignment checking is enabled and an unaligned memory reference is made. 

If the LOCK prefix is used but the destination is not a memory operand. 
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Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

64-Bit Mode Exceptions 

#SS(0) If a memory address referencing the SS segment is in a non-canonical form. 

#GP(0) If the memory address is in a non-canonical form. 

#PF(fault-code) If a page fault occurs. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the 

current privilege level is 3. 

#UD If the LOCK prefix is used but the destination is not a memory operand. 
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ADCX — Unsigned Integer Addition of Two Operands with Carry Flag 


Opcode/ 

Instruction 

Op/ 

En 

64/32bit 

Mode 

Support 

CPUID 

Feature 

Flag 

Description 

66 OF 38 F6 /r 

ADCX r32, r/m32 

RM 

V/V 

ADX 

Unsigned addition of r32 with CF, r/m32 to r32, writes CF. 

66 REX.W OF 38 F6 /r 

ADCX r64, r/m64 

RM 

V/NE 

ADX 

Unsigned addition of r64 with CF, r/m64 to r64, writes CF. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RM 

ModRM:reg (r, w) 

ModRM:r/m (r) 

NA 

NA 


Description 

Performs an unsigned addition of the destination operand (first operand), the source operand (second operand) 
and the carry-flag (CF) and stores the result in the destination operand. The destination operand is a general- 
purpose register, whereas the source operand can be a general-purpose register or memory location. The state of 
CF can represent a carry from a previous addition. The instruction sets the CF flag with the carry generated by the 
unsigned addition of the operands. 

The ADCX instruction is executed in the context of multi-precision addition, where we add a series of operands with 
a carry-chain. At the beginning of a chain of additions, we need to make sure the CF is in a desired initial state. 
Often, this initial state needs to be 0, which can be achieved with an instruction to zero the CF (e.g. XOR). 

This instruction is supported in real mode and virtual-8086 mode. The operand size is always 32 bits if not in 64- 
bit mode. 

In 64-bit mode, the default operation size is 32 bits. Using a REX Prefix in the form of REX.R permits access to addi¬ 
tional registers (R8-15). Using REX Prefix in the form of REX.W promotes operation to 64 bits. 

ADCX executes normally either inside or outside a transaction region. 

Note: ADCX defines the OF flag differently than the ADD/ADC instructions as defined in Intel® 64 and IA-32 Archi¬ 
tectures Software Developer's Manual, Volume 2A. 

Operation 

IF OperandSIze is 64-bit 
THEN CF:DEST[63:0] 

ELSE CF:DEST[31:0] 

FI; 

Flags Affected 

CF is updated based on result. OF, SF, ZF, AF and PF flags are unmodified. 

Intel C/C++ Compiler Intrinsic Equivalent 

unsigned char _addcarryx_u32 (unsigned char cjn, unsigned int srcl, unsigned int src2, unsigned int *sum_out); 
unsigned char _addcarryx_u64 (unsigned char cJn, unsigned_int64 srcl, unsigned_int64 src2, unsigned_int64 *sum_out); 

SIMD Floating-Point Exceptions 

None 

Protected Mode Exceptions 

#UD If the LOCK prefix is used. 

If CPUID.(EAX=07H, ECX=OH):EBX.ADX[bit 19] = 0. 

#SS(0) For an illegal address in the SS segment. 


- DEST[63:0] + SRC[63:0] + CF; 
DEST[31:0] + SRC[31:0] + CF; 
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#GP(0) 

For an illegal memory operand effective address in the CS, DS, ES, FS or GS segments. 

If the DS, ES, FS, or GS register is used to access memory and it contains a null segment 
selector. 

#PF(fault-code) 

#AC(0) 

For a page fault. 

If alignment checking is enabled and an unaligned memory reference is made while the 
current privilege level is 3. 


Real-Address Mode Exceptions 


#UD 

If the LOCK prefix is used. 

If CPUID.(EAX=07H, ECX=OH):EBX.ADX[bit 19] = 0. 

#SS(0) 

#GP(0) 

For an illegal address in the SS segment. 

If any part of the operand lies outside the effective address space from 0 to FFFFFI. 


Virtual-SOSe Mode Exceptions 

#UD If the LOCK prefix is used 


#SS(0) 

#GP(0) 

#PF(fault-code) 

#AC(0) 

If CPUID.(EAX=07H, ECX=OH):EBX.ADX[bit 19] = 0. 

For an illegal address in the SS segment. 

If any part of the operand lies outside the effective address space from 0 to FFFFFI. 

For a page fault. 

If alignment checking is enabled and an unaligned memory reference is made while the 
current privilege level is 3. 


Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

e4-Bit Mode Exceptions 

#UD If the LOCK prefix is used 


#SS(0) 

#GP(0) 

#PF(fault-code) 

#AC(0) 

If CPUID.(EAX=07H, ECX=OH):EBX.ADX[bit 19] = 0. 

If a memory address referencing the SS segment is in a non-canonical form. 

If the memory address is in a non-canonical form. 

For a page fault. 

If alignment checking is enabled and an unaligned memory reference is made while the 
current privilege level is 3. 
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ADD-Add 


Opcode 

Instruction 

Op/ 

En 

64-bit 

Mode 

Compat/ 
Leg Mode 

Description 

04 ib 

ADD AL, imm8 

1 

Valid 

Valid 

Add imm8 to AL. 

05 iw 

ADD AX, imm 16 

1 

Valid 

Valid 

Add imm 7 6 to AX. 

05 id 

ADD EAX, imm32 

1 

Valid 

Valid 

Add imm32 to EAX. 

REX.W + 05 id 

ADD RAX, imm32 

1 

Valid 

N.E. 

Add imm32 sign-extended to 64-bits to RAX. 

80 /O ib 

ADD r/mS, imm8 

Ml 

Valid 

Valid 

Add imm8 to r/mS. 

REX + 80 /O ib 

ADD r/mS, imm8 

Ml 

Valid 

N.E. 

Add s/gn-extended imm8 to r/m64. 

81 /O iw 

ADD r/m 16, imm 16 

Ml 

Valid 

Valid 

Add imm 7 6 to r/m 7 6. 

81 /O id 

ADD r/m32, imm32 

Ml 

Valid 

Valid 

Add imm32 to r/m32. 

REX.W + 81/0 id 

ADD r/m64, imm32 

Ml 

Valid 

N.E. 

Add imm32 sign-extended to 64-bits to 
r/m64. 

83 /O ib 

ADD r/m 16, imm8 

Ml 

Valid 

Valid 

Add s/gn-extended /mmSto r/m 7 6. 

83 /O ib 

ADD r/m32, imm8 

Ml 

Valid 

Valid 

Add s/gn-extended imm8 to r/m32. 

REX.W + 83 /O ib 

ADD r/m64, imm8 

Ml 

Valid 

N.E. 

Add s/gn-extended imm8 to r/m64. 

00 /r 

ADD r/mS, rS 

MR 

Valid 

Valid 

Add rS to r/mS. 

REX + 00 Ir 

ADD r/mS, rS 

MR 

Valid 

N.E. 

Add r8 to r/mS. 

01 Ir 

ADD r/m 76, rl6 

MR 

Valid 

Valid 

Add r76to r/m 76. 

01 Ir 

ADD r/m32, r32 

MR 

Valid 

Valid 

Add r32 to r/m32. 

REX.W + 01 /r 

ADD r/m64, r64 

MR 

Valid 

N.E. 

Add r64 to r/m64. 

02/r 

ADD rS, r/mS 

RM 

Valid 

Valid 

Add r/mS to rS. 

REX + 02 Ir 

ADD r8, r/m8 

RM 

Valid 

N.E. 

Add r/mS to rS. 

03/r 

ADD r76, r/m 76 

RM 

Valid 

Valid 

Add r/m 7 6 to r76. 

03/r 

ADD r32, r/m32 

RM 

Valid 

Valid 

Add r/m32to r32. 

REX.W + 03 Ir 

ADD r64, r/m64 

RM 

Valid 

N.E. 

Add r/m64 to r64. 


NOTES: 

*ln 64-blt mode, r/m8 can not be encoded to access the following byte registers If a REX prefix Is used: AH, BH, CH, DH. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RM 

ModRM:reg (r, w) 

ModRM:r/m (r) 

NA 

NA 

MR 

ModRM:r/m (r, w) 

ModRM:reg (r) 

NA 

NA 

Ml 

ModRM:r/m (r, w) 

imm8 

NA 

NA 

1 

AL/AX/EAX/RAX 

imm8 

NA 

NA 


Description 

Adds the destination operand (first operand) and the source operand (second operand) and then stores the result 
in the destination operand. The destination operand can be a register or a memory location; the source operand 
can be an immediate, a register, or a memory location. (However, two memory operands cannot be used in one 
instruction.) When an immediate value is used as an operand, it is sign-extended to the length of the destination 
operand format. 

The ADD instruction performs integer addition. It evaluates the result for both signed and unsigned integer oper¬ 
ands and sets the CF and OF flags to indicate a carry (overflow) in the signed or unsigned result, respectively. The 
SF flag indicates the sign of the signed result. 
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This instruction can be used with a LOCK prefix to allow the instruction to be executed atomically. 

In 64-bit mode, the instruction's default operation size is 32 bits. Using a REX prefix in the form of REX.R permits 
access to additional registers (R8-R15). Using a REX prefix in the form of REX.W promotes operation to 64 bits. See 
the summary chart at the beginning of this section for encoding data and limits. 

Operation 

DEST ^ DEST + SRC; 

Flags Affected 

The OF, SF, ZF, AF, CF, and PF flags are set according to the result. 

Protected Mode Exceptions 

#GP(0) If the destination is located in a non-writable segment. 


#UD 


#SS(0) 

#PF(fault-code) 

#AC(0) 


If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 
If the DS, ES, FS, or GS register is used to access memory and it contains a NULL segment 
selector. 

If a memory operand effective address is outside the SS segment limit. 

If a page fault occurs. 

If alignment checking is enabled and an unaligned memory reference is made while the 
current privilege level is 3. 

If the LOCK prefix is used but the destination is not a memory operand. 


Real-Address Mode Exceptions 


#GP 

#SS 

#UD 


If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 
If a memory operand effective address is outside the SS segment limit. 

If the LOCK prefix is used but the destination is not a memory operand. 


Virtual-SOSe Mode Exceptions 


#GP(0) 

#SS(0) 


If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 
If a memory operand effective address is outside the SS segment limit. 

If a page fault occurs. 

If alignment checking is enabled and an unaligned memory reference is made. 

If the LOCK prefix is used but the destination is not a memory operand. 


#PF(fault-code) 


#AC(0) 

#UD 


Compatibility Mode Exceptions 

Same exceptions as in protected mode. 


e4-Bit Mode Exceptions 


#SS(0) 

#GP(0) 


If a memory address referencing the SS segment is in a non-canonical form. 

If the memory address is in a non-canonical form. 

If a page fault occurs. 

If alignment checking is enabled and an unaligned memory reference is made while the 
current privilege level is 3. 

If the LOCK prefix is used but the destination is not a memory operand. 


#PF(fault-code) 

#AC(0) 


#UD 
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ADDPD—Add Packed Double-Precision Floating-Point Values 


Opcode/ 

Instruction 

Op/ 

En 

64/32 
bit Mode 
Support 

CPUID 

Feature 

Flag 

Description 

66 OF 58 /r 

ADDPD xmmi, xmm2/m128 

RM 

V/V 

SSE2 

Add packed double-precision floating-point values from 
xmm2/mem to xmmi and store result in xmmi. 

VEX.NDS.128.66.0F.WIG58/r 
VADDPD xmmi ,xmm2, 
xmm3/m128 

RVM 

v/v 

AVX 

Add packed double-precision floating-point values from 
xmm3/mem to xmm2 and store result in xmmi. 

VEX.NDS.256.66.0F.WIG58/r 
VADDPD ymmi, ymm2, 
ymm3/m256 

RVM 

V/V 

AVX 

Add packed double-precision floating-point values from 
ymm3/mem to ymm2 and store result in ymmi. 

EVEX.NDS.128.66.0F.W1 58/r 
VADDPD xmmi [kl }[z], xmm2, 
xmm3/m128/m64bcst 

FV 

v/v 

AVX512VL 

AVX512F 

Add packed double-precision floating-point values from 
xmm3/m128/m64bcst to xmm2 and store result in xmmi 
with writemask kl. 

EVEX.NDS.256.66.0F.W1 58 /r 
VADDPD ymmi {k1]{z}, ymm2, 
ymm3/m256/m64bcst 

FV 

v/v 

AVX512VL 

AVX512F 

Add packed double-precision floating-point values from 
ymm3/m256/m64bcst to ymm2 and store result in ymmi 
with writemask kl. 

EVEX.NDS.512.66.0F.W1 58/r 
VADDPD zmmi {k1}{z}, zmm2, 
zmm3/m512/m64bcst[er} 

FV 

v/v 

AVX512F 

Add packed double-precision floating-point values from 
zmm3/m512/m64bcst to zmm2 and store result in zmmi 
with writemask kl. 



nstruction Operand Encoding 

Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RM 

ModRM:reg (r, w) 

ModRM:r/m (r) 

NA 

NA 

RVM 

ModRM:reg (w) 

VEX.vvvv 

ModRM:r/m (r) 

NA 

FV-RVM 

ModRM:reg (w) 

EVEX.vvvv 

ModRM:r/m (r) 

NA 


Description 

Add two, four or eight packed double-precision floating-point values from the first source operand to the second 
source operand, and stores the packed double-precision floating-point results in the destination operand. 

EVEX encoded versions: The first source operand is a ZMM/YMM/XMM register. The second source operand can be 
a ZMM/YMM/XMM register, a 512/256/128-bit memory location or a 512/256/128-bit vector broadcasted from a 
64-bit memory location. The destination operand is a ZMM/YMM/XMM register conditionally updated with 
writemask kl. 

VEX.256 encoded version: The first source operand is a YMM register. The second source operand can be a YMM 
register or a 256-bit memory location. The destination operand is a YMM register. The upper bits (MAX_VL-1:256) 
of the corresponding ZMM register destination are zeroed. 

VEX.128 encoded version: the first source operand is a XMM register. The second source operand is an XMM 
register or 128-bit memory location. The destination operand is an XMM register. The upper bits (MAX_VL-1:128) 
of the corresponding ZMM register destination are zeroed. 

128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti¬ 
nation is not distinct from the first source XMM register and the upper Bits (MAX_VL-1:128) of the corresponding 
ZMM register destination are unmodified. 

Operation 

VADDPD (EVEX encoded versions) when srcZ operand is a vector register 

(KL, VL) = (2,128), (4, 256), (8, 512) 

IF(VL=512) AND (EVEX.b = 1) 

THEN 

SET_RM(EVEX.RC); 

ELSE 
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SET_RM(MXCSR.RM); 

FI; 

FOR] ^0 TO KL-1 
i ^ j * 64 

IF k1 [j] OR *no writemask* 

THEN DEST[l+63:i] ^ SRC1 [i+63:i] + SRC2[l+63:i] 

ELSE 

IF *merglng-masklng* ; mergIng-maskIng 

THEN *DEST[I+63:I] remains unchanged* 

ELSE ; zeroing-masking 

DEST[i+63:i] ^ 0 
FI 
FI; 

ENDFOR 

DEST[MAX_VL-1 :VL] ^ 0 

VADDPD (GVEX encoded versions) when src2 operand is a memory source 

(KL, VL) = (2,128), (4, 256), (8, 512) 

FOR] ^0 TO KL-1 
i ^ j * 64 

IF k1 [j] OR *no writemask* 

THEN 

IF (EVEX.b = 1) 

THEN 

DEST[i+63:i] ^ SRC1 [l+63:i] + SRC2[63:0] 

ELSE 

DEST[i+63:i] ^ SRC1 [l+63:i] + SRC2[I+63:I] 

FI; 

ELSE 

IF *merglng-masklng* ; mergIng-maskIng 

THEN *DEST[I+63:I] remains unchanged* 

ELSE ; zeroing-masking 

DEST[i+63:i] ^ 0 
FI 
FI; 

ENDFOR 

DEST[MAX_VL-1:VL]^0 


VADDPD (VEX.256 encoded version) 

DEST[63:0] ^ SRC1 [63:0] + SRC2[63:0] 

DEST[127:64] ^ SRC1 [127:64] + SRC2[127:64] 
DEST[191:128] ^ SRC1 [191:128] + SRC2[191:128] 
DEST[255:192] ^ SRC1 [255:192] + SRC2[255:192] 
DEST[MAX_VL-1:256]^0 
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VADDPD (VEX.1 Z8 encoded version) 

DEST[63:0] ^ SRC1[63:0] + SRC2[63:0] 

DEST[127:64] ^ SRC1 [127:64] + SRC2[127:64] 

DEST[MAX_VL-1:128]^0 

ADDPD (128-bit Legacy SSE version) 

DEST[63:0] ^ DEST[63:0] + SRC[63:0] 

DEST[127:64] ^ DEST[127:64] + SRC[127:64] 

DEST[MAX_VL-1:128] (Unmodified) 

Intel C/C++ Compiler Intrinsic Equivalent 

VADDPD _m512d _mm512_add_pd (_m512d a, _m512d b); 

VADDPD_mSI 2d _mm512_mask_add_pd (_mSI 2d s,_mmaskS k,_mSI 2d a,_mSI 2d b); 

VADDPD_mSI 2d _mm512_maskz_add_pd (_mmaskS k,_mSI 2d a,_mSI 2d b); 

VADDPD_m256d _mm256_mask_add_pd (_m256d s,_mmaskS k,_m256d a,_m256d b); 

VADDPD_m256d _mm256_maskz_add_pd (_mmaskS k,_m256d a,_m256d b); 

VADDPD_ml 28d _mm_mask_add_pd (_ml 28d s,_mmaskS k,_ml 28d a,_ml 28d b); 

VADDPD_ml 28d _mm_maskz_add_pd (_mmaskS k,_ml 28d a,_ml 28d b); 

VADDPD_mSI 2d _mm512_add_round_pd (_mSIZd a,_mSIZd b, int); 

VADDPD_mSI 2d _mm512_mask_add_round_pd (_mSI 2d s,_mmaskS k,_mSI 2d a,_mSI 2d b, int); 

VADDPD_m512d_mm512_maskz_add_round_pd (_mmaskS k,_mSIZd a,_mSIZd b, int); 

ADDPD _m2S6d _mm2S6_add_pd (_m2S6d a, _m2S6d b); 

ADDPD_ml 28d _mm_add_pd (_ml 28d a,_ml 28d b); 

SIMD Floating-Point Exceptions 

Overflow, Underflow, Invalid, Precision, Denormal 

Other Exceptions 

VEX-encoded instruction, see Exceptions Type 2. 

EVEX-encoded instruction, see Exceptions Type E2. 
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ADDPS—Add Packed Single-Precision Floating-Point Values 


Opcode/ 

Instruction 

Op/ 

En 

64/32 
bit Mode 
Support 

CPUID 

Feature 

Flag 

Description 

OF 58 /r 

ADDPS xmnnl, xmm2/m128 

RM 

V/V 

SSE 

Add packed single-precision floating-point values from 
xmm2/m128 to xmmi and store result in xmmi. 

VEX.NDS.128.0F.WIG 58 /r 

VADDPS xmmi ,xmm2, xmm3/m128 

RVM 

v/v 

AVX 

Add packed single-precision floating-point values from 
xmm3/m128 to xmm2 and store result in xmmi. 

VEX.NDS.256.0F.WIC 58 /r 

VADDPS ymmi, ymm2, ymm3/m256 

RVM 

V/V 

AVX 

Add packed single-precision floating-point values from 
ymm3/m256 to ymm2 and store result in ymmi. 

EVEX.NDS.128.0F.W0 58/r 

VADDPS xmmi [k1}{z}, xmm2, 
xmm3/m128/m32bcst 

FV 

v/v 

AVX512VL 

AVX512F 

Add packed single-precision floating-point values from 
xmm3/m128/m32bcst to xmm2 and store result in 
xmmi with writemask kl. 

EVEX.NDS.256.0F.W0 58 /r 

VADDPS ymmi {k1}{z}, ymm2, 
ymm3/m256/m32bcst 

FV 

v/v 

AVX512VL 

AVX512F 

Add packed single-precision floating-point values from 
ymm3/m256/m32bcst to ymm2 and store result in 
ymmi with writemask kl. 

EVEX.NDS.512.0F.W0 58/r 

VADDPS zmmi {k1}[z}, zmm2, 
zmm3/m512/m32bcst {er} 

FV 

v/v 

AVX512F 

Add packed single-precision floating-point values from 
zmm3/m512/m32bcst to zmm2 and store result in 
zmmi with writemask kl. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RM 

ModRM:reg (r, w) 

ModRM:r/m (r) 

NA 

NA 

RVM 

ModRM:reg (w) 

VEX.vvvv 

ModRM:r/m (r) 

NA 

FV-RVM 

ModRM:reg (w) 

EVEX.vvvv 

ModRM:r/m (r) 

NA 


Description 

Add four, eight or sixteen packed single-precision floating-point values from the first source operand with the 
second source operand, and stores the packed single-precision floating-point results in the destination operand. 

EVEX encoded versions: The first source operand is a ZMM/YMM/XMM register. The second source operand can be 
a ZMM/YMM/XMM register, a 512/256/128-bit memory location or a 512/256/128-bit vector broadcasted from a 
32-bit memory location. The destination operand is a ZMM/YMM/XMM register conditionally updated with 
writemask kl. 

VEX.256 encoded version: The first source operand is a YMM register. The second source operand can be a YMM 
register or a 256-bit memory location. The destination operand is a YMM register. The upper bits (MAX_VL-1:256) 
of the corresponding ZMM register destination are zeroed. 

VEX.128 encoded version: the first source operand is a XMM register. The second source operand is an XMM 
register or 128-bit memory location. The destination operand is an XMM register. The upper bits (MAX_VL-1:128) 
of the corresponding ZMM register destination are zeroed. 

128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti¬ 
nation is not distinct from the first source XMM register and the upper Bits (MAX_VL-1:128) of the corresponding 
ZMM register destination are unmodified. 

Operation 

VADDPS (EVEX encoded versions) when src2 operand is a register 

(KL, VL) = (4,128), (8, 256), (16, 512) 

IF(VL=512) AND (EVEX.b = 1) 

THEN 

SET_RM(EVEX.RC); 

ELSE 

SET_RM(MXCSR.RM); 


3-36 Vol. 2A 


ADDPS—Add Packed Single-Precision Floating-Point Values 





















INSTRUCTION SET REFERENCE, A-L 


FI; 

FORj^OTO KL-1 
i^j*32 

IF k10] OR *no writemask* 

THEN DEST[i+31 :l] ^ SRC1 [i+31 :i] + SRC2[i+31 :l] 

ELSE 

IF *merglng-masking* ; merging-masking 

THEN *DEST[i+31:i] remains unchanged* 

ELSE ; zeroIng-maskIng 

DEST[I+31:I]^0 
FI 
FI; 

ENDFOR; 

DEST[MAX_VL-1:VL]^0 

VADDPS (EVEX encoded versions) when src2 operand is a memory source 

(KL, VL) = (4,128), (8, 256), (16, 512) 

FORj^OTO KL-1 
I * 32 

IF k10] OR *no writemask* 

THEN 

IF(EVEX.b= 1) 

THEN 

DEST[i+31 :i] ^ SRC1 [i+31 :i] + SRC2[31:0] 

ELSE 

DEST[i+31 :i] ^ SRC1 [i+31 :i] + SRC2[i+31 :i] 

FI; 

ELSE 

IF *merging-masking* ; merging-masking 

THEN *DEST[i+31:i] remains unchanged* 

ELSE ; zeroing-masking 

DEST[i+31:i]^0 
FI 
FI; 

ENDFOR; 

DEST[MAX_VL-1:VL]^0 
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VADDPS (UEX.256 encoded version) 

DEST[31:0] ^ SRC1 [31:0] + SRC2[31:0] 

DEST[63:32] ^ SRC1 [63:32] + SRC2[63:32] 

DEST[95:64] ^ SRC1 [95:64] + SRC2[95:64] 

DEST[127:96] ^ SRC1 [127:96] + SRC2[127:96] 

DEST[159:128] ^ SRC1 [159:128] + SRC2[159:128] 

DEST[191:160]^ SRC1 [191:160] + SRC2[191:160] 

DEST[223:192] ^ SRC1 [223:192] + SRC2[223:192] 

DEST[255:224] ^ SRC1 [255:224] + SRC2[255:224]. 

DEST[MAX_VL-1:256]^0 

VADDPS (VEX.128 encoded version) 

DEST[31:0] ^ SRC1 [31:0] + SRC2[31:0] 

DEST[63:32] ^ SRC1 [63:32] + SRC2[63:32] 

DEST[95:64] ^ SRC1 [95:64] + SRC2[95:64] 

DEST[127:96] ^ SRC1 [127:96] + SRC2[127:96] 

DEST[MAX_VL-1:128]^0 

ADDPS (128-bit Legacy SSE version) 

DEST[31:0] ^ SRC1 [31:0] + SRC2[31:0] 

DEST[63:32] ^ SRC1 [63:32] + SRC2[63:32] 

DEST[95:64] ^ SRC1 [95:64] + SRC2[95:64] 

DEST[127:96] ^ SRC1 [127:96] + SRC2[127:96] 

DEST[MAX_VL-1:128] (Unmodified) 

Intel C/C++ Compiler Intrinsic Equivalent 

VADDPS _m512 _mm512_add_ps (_m512 a, _m512 b); 

VADDPS_m512 _mm512_mask_add_ps (_m512 s,_mmaskID k,_m512 a,_m512 b); 

VADDPS_m512 _mm512_maskz_add_ps (_mmask16 k,_m512 a,_m512 b); 

VADDPS_m256 _mm256_mask_add_ps (_m256 s,_mmask8 k,_m256 a,_m256 b); 

VADDPS_m256 _mm256_maskz_add_ps (_mmask8 k,_m256 a,_m256 b); 

VADDPS_ml 28 _mm_mask_add_ps (_ml 28d s,_mmask8 k,_ml 28 a,_ml 28 b); 

VADDPS_ml 28 _mm_maskz_add_ps (_mmask8 k,_ml 28 a,_ml 28 b); 

VADDPS_m512 _mm512_add_round_ps (_m512 a,_m512 b, Int); 

VADDPS_m512_mm512_mask_add_round_ps (_m512 s,_mmask16 k,_m512 a,_m512 b, int); 

VADDPS_m512 _mm512_maskz_add_round_ps (_mmaski 6 k,_m512 a,_m512 b, int); 

ADDPS _m256 _mm256_add_ps (_m256 a, _m256 b); 

ADDPS_ml 28 _mm_add_ps (_ml 28 a,_ml 28 b); 

SIMD Floating-Point Exceptions 

Overflow, Underflow, Invalid, Precision, Denormal 

Other Exceptions 

VEX-encoded instruction, see Exceptions Type 2. 

EVEX-encoded instruction, see Exceptions Type E2. 
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ADDSD—Add Scalar Double-Precision Floating-Point Values 


Opcode/ 

Instruction 

Op/ 

En 

64/32 
bit Mode 
Support 

CPUID 

Feature 

Flag 

Description 

F2 OF 58 /r 

ADDSD xmmi, xmm2/m64 

RM 

V/V 

SSE2 

Add the low double-precision floating-point value from 
xmm2/mem to xmmi and store the result in xmmi. 

VEX.NDS.128.F2.0F.WIG58/r 
VADDSD xmm 1, xmm2, 
xmm3/m64 

RVM 

v/v 

AVX 

Add the low double-precision floating-point value from 
xmm3/mem to xmm2 and store the result in xmmi. 

EVEX.NDS.LIG.F2.0F.W1 58 /r 
VADDSD xmmi {k1}[z}, 
xmm2, xmm3/m64{er} 

T1S 

V/V 

AVX512F 

Add the low double-precision floating-point value from 
xmm3/m64 to xmm2 and store the result in xmmi with 
writemask k1. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RM 

ModRM:reg (r, w) 

ModRM:r/m (r) 

NA 

NA 

RVM 

ModRM:reg (w) 

VEX.vvvv 

ModRM:r/m (r) 

NA 

T1S-RVM 

ModRM:reg (w) 

EVEX.vvvv 

ModRM:r/m (r) 

NA 


Description 

Adds the low double-precision floating-point values from the second source operand and the first source operand 
and stores the double-precision floating-point result in the destination operand. 

The second source operand can be an XMM register or a 64-bit memory location. The first source and destination 
operands are XMM registers. 

128-bit Legacy SSE version: The first source and destination operands are the same. Bits (MAX_VL-1:64) of the 
corresponding destination register remain unchanged. 

EVEX and VEX.128 encoded version: The first source operand is encoded by EVEX.vvvv/VEX.vvvv. Bits (127:64) of 
the XMM register destination are copied from corresponding bits in the first source operand. Bits (MAX_VL-1:128) 
of the destination register are zeroed. 

EVEX version: The low quadword element of the destination is updated according to the writemask. 

Software should ensure VADDSD is encoded with VEX.L=0. Encoding VADDSD with VEX.L=1 may encounter 
unpredictable behavior across different processor generations. 

Operation 

VADDSD (EVEX encoded version) 

IF (EVEX.b = 1) AND SRC2 *is a register* 

THEN 

SET_RM(EVEX.RC); 

ELSE 

SET_RM(MXCSR.RM); 

FI; 

IF k1 [0] or *no writemask* 

THEN DEST[63:0] ^ SRC1 [63:0] + SRC2[63:0] 

ELSE 

IF *merglng-masking* ; merging-masking 

THEN *DEST[63:0] remains unchanged* 

ELSE ; zeroing-masking 

THEN DEST[63:0] ^ 0 
FI; 

FI; 

DEST[127:64] ^ SRC1 [127:64] 
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DEST[MAX_VL-1:128]^0 

VADDSD (VEX.128 encoded version) 

DEST[63:0] ^SRC1[63:0] + SRC2[63:0] 

DEST[127:64] ^SRCI [127:64] 

DEST[MAX_VL-1:128] ^0 

ADDSD (128-bit Legacy SSE version) 

DEST[63:0] ^DEST[63:0] + SRC[63:0] 

DEST[MAX_VL-1:64] (Unmodified) 

Intel C/C++ Compiler Intrinsic Equivalent 

VADDSD_ml 28d _mm_mask_add_sd (_ml 28d s,_mmask8 k,_ml 28d a,_ml 28d b); 

VADDSD_ml 28d _mm_maskz_add_sd (_mmask8 k,_ml 28d a,_ml 28d b); 

VADDSD_ml 28d _mm_add_round_sd (_ml 28d a,_ml 28d b, Int); 

VADDSD_ml 28d _mm_mask_add_round_sd (_ml 28d s,_mmask8 k,_ml 28d a,_ml 28d b, int); 

VADDSD_ml 28d _mm_maskz_add_round_sd (_mmask8 k,_ml 28d a,_ml 28d b, int); 

ADDSD _m128d _mm_add_sd (_m128d a, _m128d b); 

SIMD Floating-Point Exceptions 

Overflow, Underflow, Invalid, Precision, Denormal 

Other Exceptions 

VEX-encoded instruction, see Exceptions Type 3. 

EVEX-encoded instruction, see Exceptions Type E3. 
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ADDSS—Add Scalar Single-Precision Floating-Point Values 


Opcode/ 

Instruction 

Op/ 

En 

64/32 
bit Mode 
Support 

CPUID 

Feature 

Flag 

Description 

F3 OF 58 /r 

ADDSS xmmi, xmm2/m32 

RM 

V/V 

SSE 

Add the low single-precision floating-point value from 
xmm2/mem to xmmi and store the result in xmmi. 

VEX.NDS.128.F3.0F.WIG58/r 
VADDSS xmm1,xmm2, 
xmm3/m32 

RVM 

v/v 

AVX 

Add the low single-precision floating-point value from 
xmm3/mem to xmm2 and store the result in xmmi. 

EVEX.NDS.LIG.F3.0F.W0 58 /r 
VADDSS xmmi {k1 }{z], xmm2, 
xmm3/m32[er} 

T1S 

V/V 

AVX512F 

Add the low single-precision floating-point value from 
xmm3/m32 to xmm2 and store the result in xmmi with 
writemask k1. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RM 

ModRM:reg (r, w) 

ModRM:r/m (r) 

NA 

NA 

RVM 

ModRM:reg (w) 

VEX.vvvv 

ModRM:r/m (r) 

NA 

T1S 

ModRM:reg (w) 

EVEX.vvvv 

ModRM:r/m (r) 

NA 


Description 

Adds the low single-precision floating-point values from the second source operand and the first source operand, 
and stores the double-precision floating-point result in the destination operand. 

The second source operand can be an XMM register or a 64-bit memory location. The first source and destination 
operands are XMM registers. 

128-bit Legacy SSE version: The first source and destination operands are the same. Bits (MAX_VL-1:32) of the 
corresponding the destination register remain unchanged. 

EVEX and VEX.128 encoded version: The first source operand is encoded by EVEX.vvvv/VEX.vvvv. Bits (127:32) of 
the XMM register destination are copied from corresponding bits in the first source operand. Bits (MAX_VL-1:128) 
of the destination register are zeroed. 

EVEX version: The low doubleword element of the destination is updated according to the writemask. 

Software should ensure VADDSS is encoded with VEX.L=0. Encoding VADDSS with VEX.L=1 may encounter unpre¬ 
dictable behavior across different processor generations. 

Operation 

VADDSS (EVEX encoded versions) 

IF (EVEX.b = 1) AND SRC2 *is a register* 

THEN 

SET_RM(EVEX.RC); 

ELSE 

SET_RM(MXCSR.RM); 

FI; 

IF k1 [0] or *no writemask* 

THEN DEST[31:0] ^ SRC1 [31:0] + SRC2[31:0] 

ELSE 

IF *merglng-masking* ; merging-masking 

THEN *DEST[31:0] remains unchanged* 

ELSE ; zeroing-masking 

THEN DEST[31:0]^0 
FI; 

FI; 

DEST[127:32] ^SRCI [127:32] 
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DEST[MAX_VL-1:128]^0 

VADDSS DEST, SRC1, SRCZ (VEX.128 encoded version) 

DEST[31:0] ^SRCI [31:0] + SRC2[31:0] 

DEST[127:32] ^SRCI [127:32] 

DEST[MAX_VL-1:128] ^0 

ADDSS DEST, SRC (128-bit Legacy SSE version) 

DEST[31:0] ^DEST[31:0] SRC[31:0] 

DEST[MAX_VL-1:32] (Unmodified) 

Intel C/C++ Compiler Intrinsic Equivalent 

VADDSS_ml 28 _mm_mask_add_ss (_ml 28 s,_mmask8 k,_ml 28 a,_ml 28 b); 

VADDSS_ml 28 _mm_maskz_add_ss (_mmask8 k,_ml 28 a,_ml 28 b); 

VADDSS_ml 28 _mm_add_round_ss (_ml 28 a,_ml 28 b, int); 

VADDSS_ml 28 _mm_mask_add_round_ss (_ml 28 s,_mmask8 k,_ml 28 a,_ml 28 b, int); 

VADDSS_ml 28 _mm_maskz_add_round_ss (_mmask8 k,_ml 28 a,_ml 28 b, int); 

ADDSS_ml 28 _mm_add_ss (_ml 28 a,_ml 28 b); 

SIMD Floating-Point Exceptions 

Overflow, Underflow, Invalid, Precision, Denormal 

Other Exceptions 

VEX-encoded instruction, see Exceptions Type 3. 

EVEX-encoded instruction, see Exceptions Type E3. 
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ADDSUBPD—Packed Double-FP Add/Subtract 


Opcode/ 

Instruction 

Op/ 

En 

64/32-bit 

Mode 

CPUID 

Feature 

Flag 

Description 

66 OF DO /r 

ADDSUBPD xmml, xmm2/m128 

RM 

V/V 

SSE3 

Add/subtract double-precision floating-point 
values from xmm2/m 128 to xmml. 

VEX.NDS.128.66.0F.WIC DO /r 

VADDSUBPD xmml, xmm2, xmm3/m128 

RVM 

v/v 

AVX 

Add/subtract packed double-precision 
floating-point values from xmm3/mem to 
xmm2 and stores result in xmml. 

VEX.NDS.256.66.0F.WIC DO /r 

VADDSUBPD ymmi, ymm2, ymm3/m256 

RVM 

V/V 

AVX 

Add / subtract packed double-precision 
floating-point values from ymm3/mem to 
ymm2 and stores result in ymmi. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RM 

ModRM:reg (r, w) 

ModRM:r/m (r) 

NA 

NA 

RVM 

ModRM:reg (w) 

VEX.vvvv (r) 

ModRM:r/m (r) 

NA 


Description 

Adds odd-numbered double-precision floating-point values of the first source operand (second operand) with the 
corresponding double-precision floating-point values from the second source operand (third operand); stores the 
result in the odd-numbered values of the destination operand (first operand). Subtracts the even-numbered 
double-precision floating-point values from the second source operand from the corresponding double-precision 
floating values in the first source operand; stores the result into the even-numbered values of the destination 
operand. 

In 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers 
(XMM8-XMM15). 

128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti¬ 
nation is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding 
VMM register destination are unmodified. See Figure 3-3. 

VEX. 128 encoded version: the first source operand is an XMM register or 128-bit memory location. The destination 
operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding VMM register destination are 
zeroed. 

VEX.256 encoded version: The first source operand is a VMM register. The second source operand can be a VMM 
register or a 256-bit memory location. The destination operand is a VMM register. 
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ADDSUBPD xmm1, xmm2/m128 


[127:64] 

[63:0] 

xmm2/m128 

T 

T 


xmmi [127:64] + xmm2/m128[127:64] 

xmmi [63:0] - xmm2/m128[63:0] 

RESULT: 

xmmi 

[127:64] 

[63:0] 



Figure 3-3. ADDSUBPD—Packed Double-FP Add/Subtract 


Operation 

ADDSUBPD (1 Z8-bit Legacy SSG version) 

DEST[63:0] ^ DEST[63:0] - SRC[63:0] 

DEST[127:64] ^ DEST[127:64] + SRC[127:64] 

DEST[VLMAX-1:128] (Unmodified) 

VADDSUBPD (VEX.128 encoded version) 

DEST[63:0] ^ SRC1 [63:0] - SRC2[63:0] 

DEST[127:64] ^ SRC1 [127:64] + SRC2[127:64] 

DEST[VLMAX-1:128]^0 

VADDSUBPD (VEX.256 encoded version) 

DEST[63:0] ^ SRC1 [63:0] - SRC2[63:0] 

DEST[127:64] ^ SRC1 [127:64] + SRC2[127:64] 

DEST[191:128] ^ SRC1 [191:128] - SRC2[191:128] 

DEST[255:192] ^ SRC1 [255:192] + SRC2[255:192] 

Intel C/C-I-+ Compiler Intrinsic Equivalent 

ADDSUBPD: _m128d _mm_addsub_pd(_m128d a, _m128d b) 

VADDSUBPD: _m256d _mm256_addsub_pd (_m256d a, _m256d b) 

Exceptions 

When the source operand is a memory operand, it must be aligned on a 16-byte boundary or a general-protection 
exception (#GP) will be generated. 

SIMD Floating-Point Exceptions 

Overflow, Underflow, Invalid, Precision, Denormal. 

Other Exceptions 

See Exceptions Type 2. 
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ADDSUBPS—Packed Single-FP Add/Subtract 


Opcode/ 

Instruction 

Op/ 

En 

64/32-bit 

Mode 

CPUID 

Feature 

Flag 

Description 

F2 OF DO /r 

ADDSUBPS xmml, xmm2/ml28 

RM 

V/V 

SSE3 

Add/subtract single-precision floating-point 
values from xmm2/m 7 28 to xmml. 

VEX.NDS.128.F2.0F.WIG DO /r 

VADDSUBPS xmnnl, xmm2, xmm3/m128 

RVM 

v/v 

AVX 

Add/subtract single-precision floating-point 
values from xmm3/mem to xmm2 and stores 
result in xmml. 

VEX.NDS.256.F2.0F.WIG DO /r 

VADDSUBPS ymmi, ymmZ, ymm3/m256 

RVM 

V/V 

AVX 

Add / subtract single-precision floating-point 
values from ymm3/mem to ymm2 and stores 
result in ymmi. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RM 

ModRM:reg (r, w) 

ModRM:r/m (r) 

NA 

NA 

RVM 

ModRM:reg (w) 

VEX.vvvv (r) 

ModRM:r/m (r) 

NA 


Description 

Adds odd-numbered single-precision floating-point values of the first source operand (second operand) with the 
corresponding single-precision floating-point values from the second source operand (third operand); stores the 
result in the odd-numbered values of the destination operand (first operand). Subtracts the even-numbered 
single-precision floating-point values from the second source operand from the corresponding single-precision 
floating values in the first source operand; stores the result into the even-numbered values of the destination 
operand. 

In 64-bit mode, using a REX prefix in the form of REX.R permits this instruction to access additional registers 
(XMM8-XMM15). 

128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti¬ 
nation is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding 
VMM register destination are unmodified. See Figure 3-4. 

VEX. 128 encoded version: the first source operand is an XMM register or 128-bit memory location. The destination 
operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding VMM register destination are 
zeroed. 

VEX.256 encoded version: The first source operand is a VMM register. The second source operand can be a VMM 
register or a 256-bit memory location. The destination operand is a VMM register. 
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ADDSUBPS xmmi, xmm2/m128 




[127:96] 

[95:64] 

[63:32] 

[31:0] 

xmm2/ 

m128 







xmmi [127:96] + 
xmm2/m128[127:96] 

xmmi[95:64] -xmm2/ 
m128[95:64] 

xmmi [63:32] + 
xmm2/m128[63:32] 

xmmi [31:0] - 
xmm2/m128[31:0] 

RESULT: 

xmmi 

[127:96] 

[95:64] 

[63:32] 

[31:0] 



OM15992 


Figure 3-4. ADDSUBPS—Packed Single-FP Add/Subtract 


Operation 

ADDSUBPS (128-bit Legacy SSE version) 

DEST[31:0] ^ DEST[31:0] - SRC[31:0] 
DEST[63:32] ^ DEST[63:32] + SRC[63:32] 
DEST[95:64] ^ DEST[95:64] - SRC[95:64] 
DEST[127:96] ^ DEST[127:96] + SRC[127:96] 
DEST[VLMAX-1:128] (Unmodified) 


VADDSUBPS (VEX.128 encoded version) 

DEST[31:0] ^ SRC1 [31:0] - SRC2[31:0] 
DEST[63:32] ^ SRC1 [63:32] + SRC2[63:32] 
DEST[95:64] ^ SRC1 [95:64] - SRC2[95:64] 
DEST[127:96] ^ SRC1 [127:96] + SRC2[127:96] 
DEST[VLMAX-1:128]^0 


VADDSUBPS {VEX.256 encoded version) 

DEST[31:0] ^ SRC1 [31:0] - SRC2[31:0] 

DEST[63:32] ^ SRC1 [63:32] + SRC2[63:32] 

DEST[95:64] ^ SRC1 [95:64] - SRC2[95:64] 

DEST[127:96] ^ SRC1 [127:96] + SRC2[127:96] 

DEST[159:128] ^ SRC1 [159:128] - SRC2[159:128] 

DEST[191:160]^ SRC1 [191:160] + SRC2[191:160] 

DEST[223:192] ^ SRC1 [223:192] - SRC2[223:192] 

DEST[255:224] ^ SRC1 [255:224] + SRC2[255:224]. 

Intel C/C-F-i- Compiler Intrinsic Equivalent 

ADDSUBPS: _ml 28 _mm_addsub_ps(_ml 28 a,_ml 28 b) 

VADDSUBPS: _m256 _mm256_addsub_ps (_m256 a, _m256 b) 

Exceptions 

When the source operand is a memory operand, the operand must be aligned on a 16-byte boundary or a general- 
protection exception (#GP) will be generated. 
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SIMD Floating-Point Exceptions 

Overflow, Underflow, Invalid, Precision, Denormal. 

Other Exceptions 

See Exceptions Type 2. 
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ADOX — Unsigned Integer Addition of Two Operands with Overflow Flag 


Opcode/ 

Instruction 

Op/ 

En 

64/32bit 

Mode 

Support 

CPUID 

Feature 

Flag 

Description 

F3 OF 38 F6 /r 

ADOX r32, r/m32 

RM 

V/V 

ADX 

Unsigned addition of r32 with OF, r/m32 to r32, writes OF. 

F3 REX.W OF 38 F6 /r 

ADOX r64, r/m64 

RM 

V/NE 

ADX 

Unsigned addition of r64 with OF, r/m64 to r64, writes OF. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RM 

ModRM:reg (r, w) 

ModRM:r/m (r) 

NA 

NA 


Description 

Performs an unsigned addition of the destination operand (first operand), the source operand (second operand) 
and the overflow-flag (OF) and stores the result in the destination operand. The destination operand is a general- 
purpose register, whereas the source operand can be a general-purpose register or memory location. The state of 
OF represents a carry from a previous addition. The instruction sets the OF flag with the carry generated by the 
unsigned addition of the operands. 

The ADOX instruction is executed in the context of multi-precision addition, where we add a series of operands with 
a carry-chain. At the beginning of a chain of additions, we execute an instruction to zero the OF (e.g. XOR). 

This instruction is supported in real mode and virtual-8086 mode. The operand size is always 32 bits if not in 64-bit 
mode. 

In 64-bit mode, the default operation size is 32 bits. Using a REX Prefix in the form of REX.R permits access to addi¬ 
tional registers (R8-15). Using REX Prefix in the form of REX.W promotes operation to 64-bits. 

ADOX executes normally either inside or outside a transaction region. 

Note: ADOX defines the CF and OF flags differently than the ADD/ADC instructions as defined in I ntei® 64 and 
IA-32 Architectures Software Developer's Manual, Volume 2A. 

Operation 

IF OperandSIze Is 64-bit 
THEN OF:DEST[63:0] 

ELSE OF:DEST[31:0] 

FI; 

Flags Affected 

OF is updated based on result. CF, SF, ZF, AF and PF flags are unmodified. 

Intel C/C++ Compiler Intrinsic Equivalent 

unsigned char_addcarryx_u32 (unsigned char cjn, unsigned int srcl, unsigned int src2, unsigned int *sum_out); 
unsigned char _addcarryx_u64 (unsigned char cJn, unsigned_int64 srcl, unsigned_int64 src2, unsigned_int64 *sum_out); 

SIMD Floating-Point Exceptions 

None 


- DEST[63:0] + SRC[63:0] + OF; 
DEST[31:0] + SRC[31:0] + 0F; 
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Protected Mode Exceptions 

#UD If the LOCK prefix is used 


#SS(0) 

#GP(0) 

If CPUID.(EAX=07H, ECX=OH):EBX.ADX[bit 19] = 0. 

For an illegal address in the SS segment. 

For an illegal memory operand effective address in the CS, DS, ES, FS or GS segments. 

If the DS, ES, FS, or GS register is used to access memory and it contains a null segment 
selector. 

#PF(fault-code) 

#AC(0) 

For a page fault. 

If alignment checking is enabled and an unaligned memory reference is made while the 
current privilege level is 3. 


Real-Address Mode Exceptions 

#UD If the LOCK prefix is used 


#SS(0) 

#GP(0) 

If CPUID.(EAX=07H, ECX=OH):EBX.ADX[bit 19] = 0. 

For an illegal address in the SS segment. 

If any part of the operand lies outside the effective address space from 0 to FFFFFI. 


Virtual-SOSe Mode Exceptions 

#UD If the LOCK prefix is used 


#SS(0) 

#GP(0) 

#PF(fault-code) 

#AC(0) 

If CPUID.(EAX=07H, ECX=OH):EBX.ADX[bit 19] = 0. 

For an illegal address in the SS segment. 

If any part of the operand lies outside the effective address space from 0 to FFFFFI. 

For a page fault. 

If alignment checking is enabled and an unaligned memory reference is made while the 
current privilege level is 3. 


Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

64-Bit Mode Exceptions 

#UD If the LOCK prefix is used 


#SS(0) 

#GP(0) 

#PF(fault-code) 

#AC(0) 

If CPUID.(EAX=07H, ECX=OH):EBX.ADX[bit 19] = 0. 

If a memory address referencing the SS segment is in a non-canonical form. 

If the memory address is in a non-canonical form. 

For a page fault. 

If alignment checking is enabled and an unaligned memory reference is made while the 
current privilege level is 3. 


ADOX — Unsigned Integer Addition of Two Operands with Overflow Flag 
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AESDEC—Perform One Round of an AES Decryption Flow 


Opcode/ 

Instruction 

Op/ 

En 

64/32-bit 

Mode 

CPUID 

Feature 

Flag 

Description 

66 OF 38 DE /r 

AESDEC xmmi, xmm2/m128 

RM 

V/V 

AES 

Perform one round of an AES decryption flow, 
using the Equivalent Inverse Cipher, operating 
on a 128-bit data (state) from xmmi with a 

128-bit round key from xmm2/m128. 

VEX.NDS.128.66.0F38.WIG DE /r 

VAESDEC xmmi, xmm2, xmm3/m128 

RVM 

v/v 

Both AES 
and 

AVX flags 

Perform one round of an AES decryption flow, 
using the Equivalent Inverse Cipher, operating 
on a 128-bit data (state) from xmm2 with a 

128-bit round key from xmm3/m128; store 
the result in xmmi. 


Instruction Operand Encoding 


Op/En 

Operand 1 

0perand2 

Operands 

0perand4 

RM 

ModRM:reg (r, w) 

ModRM:r/m (r) 

NA 

NA 

RVM 

ModRM:reg (w) 

VEX.vvvv (r) 

ModRM:r/m (r) 

NA 


Description 

This instruction performs a single round of the AES decryption flow using the Equivalent Inverse Cipher, with the 
round key from the second source operand, operating on a 128-bit data (state) from the first source operand, and 
store the result in the destination operand. 

Use the AESDEC instruction for all but the last decryption round. For the last decryption round, use the AESDE- 
CLAST instruction. 

128-bit Legacy SSE version: The first source operand and the destination operand are the same and must be an 
XMM register. The second source operand can be an XMM register or a 128-bit memory location. Bits (VLMAX- 
1:128) of the corresponding VMM destination register remain unchanged. 

VEX.128 encoded version: The first source operand and the destination operand are XMM registers. The second 
source operand can be an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the destination VMM 
register are zeroed. 

Operation 

AESDEC 

STATE ^ SRC 1; 

RoundKey <- SRC2; 

STATE ^ lnvShiftRows( STATE); 

STATE ^ lnvSubBytes( STATE); 

STATE ^ lnvMixColumns( STATE); 

DEST[127:0] ^ STATE XOR RoundKey; 

DEST[VLMAX-1:128] (Unmodified) 

VAESDEC 

STATE ^ SRC 1; 

RoundKey <- SRC2; 

STATE ^ lnvShiftRows( STATE); 

STATE ^ lnvSubBytes( STATE ); 

STATE ^ lnvMixColumns( STATE ); 

DEST[127:0] ^ STATE XOR RoundKey; 

DEST[VLMAX-1:128]^0 
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Intel C/C++ Compiler Intrinsic Equivalent 

(V)AESDEC: _m1281 _mm_aesdec {_m1281, _m128i) 

SIMD Floating-Point Exceptions 

None 

Other Exceptions 

See Exceptions Type 4. 


AESDEC—Perform One Round of an AES Decryption Flow 
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AESDECLAST—Perform Last Round of an AES Decryption Flow 


Opcode/ 

Instruction 

Op/ 

En 

64/32-bit 

Mode 

CPUID 

Feature 

Flag 

Description 

66 OF 38 OF /r 

AESDECLAST xmmi, xmm2/m128 

RM 

V/V 

AES 

Perform the last round of an AES decryption 
flow, using the Equivalent Inverse Cipher, 
operating on a 128-bit data (state) from 
xmmi with a 128-bit round key from 
xmm2/m128. 

VEX.NDS.128.66.0F38.WIG DF /r 

VAESDECLAST xmm1,xmm2, xmm3/m128 

RVM 

v/v 

Both AES 
and 

AVX flags 

Perform the last round of an AES decryption 
flow, using the Equivalent Inverse Cipher, 
operating on a 128-bit data (state) from 
xmm2 with a 128-bit round key from 
xmm3/m128; store the result in xmmi. 


Instruction Operand Encoding 


Op/En 

Operand 1 

0perand2 

Operands 

0perand4 

RM 

ModRM:reg (r, w) 

ModRM:r/m (r) 

NA 

NA 

RVM 

ModRM:reg (w) 

VEX.vvvv (r) 

ModRM:r/m (r) 

NA 


Description 

This instruction performs the last round of the AES decryption flow using the Equivalent Inverse Cipher, with the 
round key from the second source operand, operating on a 128-bit data (state) from the first source operand, and 
store the result in the destination operand. 

128-bit Legacy SSE version: The first source operand and the destination operand are the same and must be an 
XMM register. The second source operand can be an XMM register or a 128-bit memory location. Bits (VLMAX- 
1:128) of the corresponding VMM destination register remain unchanged. 

VEX.128 encoded version: The first source operand and the destination operand are XMM registers. The second 
source operand can be an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the destination VMM 
register are zeroed. 

Operation 

AESDECLAST 

STATED SRC 1; 

RoundKey <- SRC2; 

STATE ^ lnvShiftRows( STATE); 

STATE ^ lnvSubBytes( STATE); 

DEST[127:0] ^ STATE XOR RoundKey; 

DEST[VLMAX-1:128] (Unmodified) 

VAESDECLAST 

STATE ^ SRC 1; 

RoundKey <- SRC2; 

STATE ^ lnvShiftRows( STATE); 

STATE ^ lnvSubBytes( STATE ); 

DEST[127:0] ^ STATE XOR RoundKey; 

DEST[VLMAX-1:128]^0 

Intel C/C++ Compiler Intrinsic Equivalent 

(V)AESDECLAST: _m128i _mm_aesdeclast (_m1281, _m128i) 
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SIMD Floating-Point Exceptions 

None 

Other Exceptions 

See Exceptions Type 4. 
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AESENC—Perform One Round of an AES Encryption Flow 


Opcode/ 

Instruction 

Op/ 

En 

64/32-bit 

Mode 

CPUID 

Feature 

Flag 

Description 

66 OF 38 DC /r 

AESENC xmmi, xmm2/m128 

RM 

V/V 

AES 

Perform one round of an AES encryption flow, 
operating on a 128-bit data (state) from 
xmmi with a 128-bit round key from 
xmm2/m128. 

VEX.NDS.128.66.0F38.WIG DC /r 

VAESENC xmmi, xmm2, xmm3/m128 

RVM 

v/v 

Both AES 
and 

AVX flags 

Perform one round of an AES encryption flow, 
operating on a 128-bit data (state) from 
xmm2 with a 128-bit round key from the 
xmm3/m128; store the result in xmmi. 


Instruction Operand Encoding 


Op/En 

Operand 1 

0perand2 

0perand3 

0perand4 

RM 

ModRM:reg (r, w) 

ModRM:r/m (r) 

NA 

NA 

RVM 

ModRM:reg (w) 

VEX.vvvv (r) 

ModRM:r/m (r) 

NA 


Description 

This instruction performs a single round of an AES encryption flow using a round key from the second source 
operand, operating on 128-bit data (state) from the first source operand, and store the result in the destination 
operand. 

Use the AESENC instruction for all but the last encryption rounds. For the last encryption round, use the AESENC- 
CLAST instruction. 

128-bit Legacy SSE version: The first source operand and the destination operand are the same and must be an 
XMM register. The second source operand can be an XMM register or a 128-bit memory location. Bits (VLMAX- 
1:128) of the corresponding VMM destination register remain unchanged. 

VEX.128 encoded version: The first source operand and the destination operand are XMM registers. The second 
source operand can be an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the destination VMM 
register are zeroed. 

Operation 

AESENC 

STATED SRC 1; 

RoundKey <- SRC2; 

STATE ^ ShiftRows( STATE); 

STATE ^ SubBytes( STATE); 

STATE ^ MixColumns( STATE); 

DEST[127:0] ^ STATE XOR RoundKey; 

DEST[VLMAX-1:128] (Unmodified) 

VAESENC 

STATED SRC 1; 

RoundKey <- SRC2; 

STATE ^ ShiftRows( STATE); 

STATE ^ SubBytes( STATE); 

STATE ^ MixColumns( STATE ); 

DEST[127:0] ^ STATE XOR RoundKey; 

DEST[VLMAX-1:128]^0 
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Intel C/C++ Compiler Intrinsic Equivalent 

(V)AESENC: _m128i_mm_aesenc (_m128l,_m1281) 

SIMD Floating-Point Exceptions 

None 

Other Exceptions 

See Exceptions Type 4. 


AESENC—Perform One Round of an AES Encryption Flow 
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AESENCLAST—Perform Last Round of an AES Encryption Flow 


Opcode/ 

Instruction 

Op/ 

En 

64/32-bit 

Mode 

CPUID 

Feature 

Flag 

Description 

66 OF 38 DD /r 

AESENCLAST xmmi, xmm2/m128 

RM 

V/V 

AES 

Perform the last round of an AES encryption 
flow, operating on a 128-bit data (state) from 
xmmi with a 128-bit round key from 
xmm2/m128. 

VEX.NDS.128.66.0F38.WIG DD /r 

VAESENCLAST xmmi, xmm2, xmm3/m128 

RVM 

v/v 

Both AES 
and 

AVX flags 

Perform the last round of an AES encryption 
flow, operating on a 128-bit data (state) from 
xmm2 with a 128 bit round key from 
xmm3/m128; store the result in xmmi. 


Instruction Operand Encoding 


Op/En 

Operand 1 

0perand2 

Operands 

0perand4 

RM 

ModRM:reg (r, w) 

ModRM:r/m (r) 

NA 

NA 

RVM 

ModRM:reg (w) 

VEX.vvvv (r) 

ModRM:r/m (r) 

NA 


Description 

This instruction performs the last round of an AES encryption flow using a round key from the second source 
operand, operating on 128-bit data (state) from the first source operand, and store the result in the destination 
operand. 

128-bit Legacy SSE version: The first source operand and the destination operand are the same and must be an 
XMM register. The second source operand can be an XMM register or a 128-bit memory location. Bits (VLMAX- 
1:128) of the corresponding VMM destination register remain unchanged. 

VEX.128 encoded version: The first source operand and the destination operand are XMM registers. The second 
source operand can be an XMM register or a 128-bit memory location. Bits (VLMAX-1:128) of the destination VMM 
register are zeroed. 

Operation 

AESENCLAST 

STATE ^ SRC 1; 

RoundKey <- SRC2; 

STATE ^ ShiftRows( STATE); 

STATE ^ SubBytes( STATE ); 

DEST[127:0] ^ STATE XOR RoundKey; 

DEST[VLMAX-1:128] (Unmodified) 

VAESENCLAST 

STATED SRC 1; 

RoundKey <- SRC2; 

STATE ^ ShiftRows( STATE); 

STATE ^ SubBytes( STATE); 

DEST[127:0] ^ STATE XOR RoundKey; 

DEST[VLMAX-1:128]^0 

Intel C/C++ Compiler Intrinsic Equivalent 

(V)AESENCLAST: _m128i _mm_aesenclast (_m128i_ml 28i) 
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SIMD Floating-Point Exceptions 

None 

Other Exceptions 

See Exceptions Type 4. 
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AESIMC—Perform the AES InvMixColumn Transformation 


Opcode/ 

Instruction 

Op/ 

En 

64/32-bit 

Mode 

CPUID 

Feature 

Flag 

Description 

66 OF 38 DB /r 

AESIMC xmmi, xmm2/m128 

RM 

V/V 

AES 

Perform the InvMixColumn transformation on 
a 128-bit round key from xmm2/m128 and 
store the result in xmmi. 

VEX.1 28.66.0F38.WIG DB /r 

VAESIMC xmmi, xmm2/m128 

RM 

v/v 

Both AES 
and 

AVX flags 

Perform the InvMixColumn transformation on 
a 128-bit round key from xmm2/m128 and 
store the result in xmmi. 


Instruction Operand Encoding 


Op/En 

Operand 1 

0perand2 

Operands 

0perand4 

RM 

ModRM:reg (w) 

ModRM:r/m (r) 

NA 

NA 


Description 

Perform the InvMixColumns transformation on the source operand and store the result in the destination operand. 
The destination operand is an XMM register. The source operand can be an XMM register or a 128-bit memory loca¬ 
tion. 

Note: the AESIMC instruction should be applied to the expanded AES round keys (except for the first and last round 
key) in order to prepare them for decryption using the "Equivalent Inverse Cipher" (defined in FIPS 197). 

128-bit Legacy SSE version: Bits (VLMAX-1:128) of the corresponding VMM destination register remain 
unchanged. 

VEX.128 encoded version: Bits (VLMAX-1:128) of the destination VMM register are zeroed. 

Note: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b, otherwise instructions will #UD. 

Operation 

AESIMC 

DEST[127:0] ^ lnvMixColumns( SRC); 

DEST[VLMAX-1:128] (Unmodified) 

VAESIMC 

DEST[127:0] ^ lnvMixColumns( SRC ); 

DEST[VLMAX-1:128]^0; 

Intel C/C++ Compiler Intrinsic Equivalent 

(V)AESIMC: _m128i _mm_aesimc (_m128i) 

SIMD Floating-Point Exceptions 

None 

Other Exceptions 

See Exceptions Type 4; additionally 
#UD If VEX.vvvv iiiiB. 
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AESKEYGENASSIST—AES Round Key Generation Assist 


Opcode/ 

Instruction 

Op/ 

En 

64/32-bit 

Mode 

CPUID 

Feature 

Flag 

Description 

66 OF 3A OF /r ib 

AESKEYGENASSIST xmmi, xmm2/m128, imm8 

RMI 

V/V 

AES 

Assist in AES round key generation using an 8 
bits Round Constant (RCON) specified in the 
immediate byte, operating on 128 bits of data 
specified in xmm2/m128 and stores the 
result in xmmi. 

VEX.128.66.0F3A.WIG OF /r ib 

VAESKEYGENASSIST xmmi, xmm2/m128, immS 

RMI 

v/v 

Both AES 
and 

AVX flags 

Assist in AES round key generation using 8 
bits Round Constant (RCON) specified in the 
immediate byte, operating on 128 bits of data 
specified in xmm2/m128 and stores the 
result in xmmi. 


Instruction Operand Encoding 


Op/En 

Operand 1 

0perand2 

Operands 

0perand4 

RMI 

ModRM:reg (w) 

ModRM:r/m (r) 

imm8 

NA 


Description 

Assist in expanding the AES cipher key, by computing steps towards generating a round key for encryption, using 
128-bit data specified in the source operand and an 8-bit round constant specified as an immediate, store the 
result in the destination operand. 

The destination operand is an XMM register. The source operand can be an XMM register or a 128-bit memory loca¬ 
tion. 

128-bit Legacy SSE version: Bits (VLMAX-1:128) of the corresponding VMM destination register remain 
unchanged. 

VEX.128 encoded version: Bits (VLMAX-1:128) of the destination VMM register are zeroed. 

Note: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b, otherwise instructions will #UD. 

Operation 

AESKEYGENASSIST 

X3[31:0]^SRC[127: 96]; 

X2[31:0]^SRC [95: 64]; 

XI [31:0]^ SRC [63:32]; 

X0[31:0]^SRC [31:0]; 

RC0N[31:0] ^ ZeroExtend(lmm8[7:0]); 

DEST[31:0]^SubWord(X1); 

DEST[63:32 ] ^ RotWord] SubWord(X1)) XOR RCON; 

DEST[95:64] ^ SubWord(X3); 

DEST[127:96] ^ RotWord( SubWord(X3)) XOR RCON; 

DEST[VLMAX-1:128] (Unmodified) 


AESKEYGENASSIST-AES Round Key Generation Assist 
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VAESKEYGENASSIST 

X3[31:0]^SRC[127:96]; 

X2[31:0]^SRC [95: 64]; 

X1[31:0]^SRC [63:32]; 

X0[31:0]^SRC [31:0]; 

RCON[31:0] ^ ZeroExtend(lmm8[7:0]); 

DEST[31:0] ^SubWord(XI); 

DEST[63:32 ] ^ RotWord( SubWord(X1)) XOR RCON; 

DEST[95:64] ^ SubWord(X3); 

DEST[127:96] ^ RotWord( SubWord(X3)) XOR RCON; 
DEST[VLMAX-1:128]^0; 

Intel C/C++ Compiler Intrinsic Equivalent 

(\/)AESKEYCENASSIST: _ml 281 _mm_aeskeygenasslst (_ml 281, const Int) 

SIMD Floating-Point Exceptions 

None 

Other Exceptions 

See Exceptions Type 4; additionally 
#UD If VEX.vvvv iiiiB. 
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AND-Logical AND 


Opcode 

Instruction 

Op/ 

En 

64-bit 

Mode 

Compat/ 
Leg Mode 

Description 

24 ib 

AND AL, immS 

1 

Valid 

Valid 

AL AND immS. 

25 iw 

AND kX,imml6 

1 

Valid 

Valid 

AX ANDimm76. 

25 id 

AND EAX, imm32 

1 

Valid 

Valid 

EAX AND imm32. 

REX.W + 25 id 

AND RAX, imm32 

1 

Valid 

N.E. 

RAX AND imm32 sign-extended to 64-bits. 

80 /4 ib 

AND r/mS, immS 

Ml 

Valid 

Valid 

r/mSAND immS. 

REX + 80 /4 ib 

AND r/mS*, immS 

Ml 

Valid 

N.E. 

r/mSAND immS. 

81 /4 iw 

AND r/ml6, immlB 

Ml 

Valid 

Valid 

r/m76 AND immlB. 

81 /4 id 

AND r/m32, imm32 

Ml 

Valid 

Valid 

r/m32 AND imm32. 

REX.W + 81/4 id 

AND r/m64, imm32 

Ml 

Valid 

N.E. 

r/m64 AND imm32 sign extended to 64-bits. 

83 /4 ib 

AND r/m 16, immS 

Ml 

Valid 

Valid 

r/m 76 AND immS (sign-extended). 

83 /4 ib 

AND r/m32, immS 

Ml 

Valid 

Valid 

r/m32 AND immS (sign-extended). 

REX.W + 83 /4 ib 

AND r/m64, immS 

Ml 

Valid 

N.E. 

r/m64 AND immS (sign-extended). 

20/r 

AND r/mS, rS 

MR 

Valid 

Valid 

r/mSANDrS. 

REX + 20 /r 

AND r/mS\ rS 

MR 

Valid 

N.E. 

r/m64 AND rS (sign-extended). 

21 /r 

AND r/m 76, rl6 

MR 

Valid 

Valid 

r/m76ANDr76. 

21 /r 

AND r/m32, r32 

MR 

Valid 

Valid 

r/m32 AND r32. 

REX.W+ 21 /r 

AND r/m64, r64 

MR 

Valid 

N.E. 

r/m64 AND r32. 

22 Ir 

AND rS, r/mS 

RM 

Valid 

Valid 

rSANDr/mS. 

REX + 22 Ir 

AND r8 ,r/m8 

RM 

Valid 

N.E. 

r/m64 AND rS (sign-extended). 

23 Ir 

AND r76, r/m76 

RM 

Valid 

Valid 

r76 AND r/m 76. 

23 Ir 

AND r32, r/m32 

RM 

Valid 

Valid 

r32 AND r/m32. 

REX.W + 23 Ir 

AND r64, r/mS4 

RM 

Valid 

N.E. 

r64 AND r/m64. 


NOTES: 

*ln 64-blt mode, r/m8 can not be encoded to access the following byte registers If a REX prefix is used: AH, BH, CH, DH. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RM 

ModRM:reg (r, w) 

ModRM:r/m (r) 

NA 

NA 

MR 

ModRM:r/m (r, w) 

ModRM:reg (r) 

NA 

NA 

Ml 

ModRM:r/m (r, w) 

imm8 

NA 

NA 

1 

AL/AX/EAX/RAX 

imm8 

NA 

NA 


Description 

Performs a bitwise AND operation on the destination (first) and source (second) operands and stores the result in 
the destination operand location. The source operand can be an immediate, a register, or a memory location; the 
destination operand can be a register or a memory location. (However, two memory operands cannot be used in 
one instruction.) Each bit of the result is set to 1 if both corresponding bits of the first and second operands are 1; 
otherwise, it is set to 0. 

This instruction can be used with a LOCK prefix to allow the it to be executed atomically. 

In 64-bit mode, the instruction's default operation size is 32 bits. Using a REX prefix in the form of REX.R permits 
access to additional registers (R8-R15). Using a REX prefix in the form of REX.W promotes operation to 64 bits. See 
the summary chart at the beginning of this section for encoding data and limits. 


AND-Logical AND 
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Operation 

DEST ^ DEST AND SRC; 

Flags Affected 

The OF and CF flags are cleared; the SF, ZF, and PF flags are set according to the result. The state of the AF flag is 
undefined. 


Protected Mode Exceptions 


#GP(0) 


#SS(0) 

#PF(fault-code) 

#AC(0) 

#UD 


If the destination operand points to a non-writable segment. 

If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 
If the DS, ES, FS, or GS register contains a NULL segment selector. 

If a memory operand effective address is outside the SS segment limit. 

If a page fault occurs. 

If alignment checking is enabled and an unaligned memory reference is made while the 
current privilege level is 3. 

If the LOCK prefix is used but the destination is not a memory operand. 


Real-Address Mode 

#GP 

#SS 

#UD 


Exceptions 

If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 
If a memory operand effective address is outside the SS segment limit. 

If the LOCK prefix is used but the destination is not a memory operand. 


Virtual-SOSe Mode 

#GP(0) 

#SS(0) 

#PF(fault-code) 

#AC(0) 

#UD 


Exceptions 

If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 
If a memory operand effective address is outside the SS segment limit. 

If a page fault occurs. 

If alignment checking is enabled and an unaligned memory reference is made. 

If the LOCK prefix is used but the destination is not a memory operand. 


Compatibility Mode Exceptions 

Same exceptions as in protected mode. 


e4-Bit Mode Exceptions 

#SS(0) If a memory address referencing the SS segment is in a non-canonical form. 

#GP(0) If the memory address is in a non-canonical form. 

#PF(fault-code) If a page fault occurs. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the 

current privilege level is 3. 

#UD If the LOCK prefix is used but the destination is not a memory operand. 
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ANDN - Logical AND NOT 


Opcode/Instruction 

Op/ 

En 

64/32 

-bit 

Mode 

CPUID 

Feature 

Flag 

Description 

VEX.NDS.LZ.0F38.W0 F2 /r 
ANDN r32a, r32b, r/m32 

RVM 

V/V 

BMI1 

Bitwise AND of inverted r32b with r/m32, store result in r32a. 

VEX.NDS.LZ. 0F38.W1 F2 /r 
ANDN r64a, r64b, r/m64 

RVM 

V/NE 

BMI1 

Bitwise AND of inverted r64b with r/m64, store result in r64a. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RVM 

ModRM:reg (w) 

VEX.vvvv (r) 

ModRM:r/m (r) 

NA 


Description 

Performs a bitwise logical AND of inverted second operand (the first source operand) with the third operand (the 
second source operand). The result is stored in the first operand (destination operand). 

This instruction is not supported in real mode and virtual-8086 mode. The operand size is always 32 bits if not in 
64-bit mode. In 64-bit mode operand size 64 requires VEX.Wl. VEX.Wl is ignored in non-64-bit modes. An 
attempt to execute this instruction with VEX.L not equal to 0 will cause #UD. 

Operation 

DEST ^ (NOT SRC1) bItwIseAND SRC2; 

SF^DEST[0perandSlze-1]; 

ZF ^ (DEST = 0); 

Flags Affected 

SF and ZF are updated based on result. OF and CF flags are cleared. AF and PF flags are undefined. 

Intel C/C++ Compiler Intrinsic Equivalent 

Auto-generated from high-level language. 

SIMD Floating-Point Exceptions 

None 

Other Exceptions 

See Section 2.5.1, "Exception Conditions for VEX-Encoded GPR Instructions", Table 2-29; additionally 
#UD IfVEX.W=l. 
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ANDPD—Bitwise Logical AND of Packed Double Precision Floating-Point Values 


Opcode/ 

Instruction 

Op/ 

En 

64/32 
bit Mode 
Support 

CPUID 

Feature 

Flag 

Description 

66 OF 54 /r 

ANDPD xmmi, xmm2/m128 

RM 

V/V 

SSE2 

Return the bitwise logical AND of packed double¬ 
precision floating-point values in xmmi and xmm2/mem. 

VEX.NDS.128.66.0F54/r 

VANDPD xmmi, xmm2, 
xmm3/m128 

RVM 

v/v 

AVX 

Return the bitwise logical AND of packed double¬ 
precision floating-point values in xmm2 and xmm3/mem. 

VEX.NDS.256.66.0F 54 /r 

VANDPD ymmi, ymm2, 
ymm3/m256 

RVM 

V/V 

AVX 

Return the bitwise logical AND of packed double¬ 
precision floating-point values in ymm2 and ymm3/mem. 

EVEX.NDS.128.66.0F.W1 54/r 
VANDPD xmmi {k1}{z}, xmm2, 
xmm3/m128/m64bcst 

FV 

v/v 

AVX512VL 

AVX512DQ 

Return the bitwise logical AND of packed double¬ 
precision floating-point values in xmm2 and 
xmm3/m128/m64bcst subject to writemask kl. 

EVEX.NDS.256.66.0F.W1 54/r 
VANDPD ymmi [kl }[z}, ymm2, 
ymm3/m256/m64bcst 

FV 

v/v 

AVX512VL 

AVX512DQ 

Return the bitwise logical AND of packed double¬ 
precision floating-point values in ymm2 and 
ymm3/m256/m64bcst subject to writemask kl. 

EVEX.NDS.512.66.0F.W1 54/r 
VANDPD zmmi [k1}[z}, zmm2, 
zmm3/m512/m64bcst 

FV 

v/v 

AVX512DQ 

Return the bitwise logical AND of packed double¬ 
precision floating-point values in zmm2 and 
zmm3/m512/m64bcst subject to writemask kl. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RM 

ModRM:reg (r, w) 

ModRM:r/m (r) 

NA 

NA 

RVM 

ModRM:reg (w) 

VEX.vvvv 

ModRM:r/m (r) 

NA 

FV 

ModRM:reg (w) 

EVEX.vvvv 

ModRM:r/m (r) 

NA 


Description 

Performs a bitwise logical AND of the two, four or eight packed double-precision floating-point values from the first 
source operand and the second source operand, and stores the result in the destination operand. 

EVEX encoded versions: The first source operand is a ZMM/YMM/XMM register. The second source operand can be 
a ZMM/YMM/XMM register, a 512/256/128-bit memory location, ora 512/256/128-bit vector broadcasted from a 
64-bit memory location. The destination operand is a ZMM/YMM/XMM register conditionally updated with 
writemask kl. 

VEX.256 encoded version: The first source operand is a YMM register. The second source operand is a YMM register 
or a 256-bit memory location. The destination operand is a YMM register. The upper bits (MAX_VL-1:256) of the 
corresponding ZMM register destination are zeroed. 

VEX.128 encoded version: The first source operand is an XMM register. The second source operand is an XMM 
register or 128-bit memory location. The destination operand is an XMM register. The upper bits (MAX_VL-1:128) 
of the corresponding ZMM register destination are zeroed. 

128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti¬ 
nation is not distinct from the first source XMM register and the upper bits (MAX_VL-1:128) of the corresponding 
register destination are unmodified. 
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Operation 

VANDPD (EVEX encoded versions) 

(KL, VL) = (2,128), (4, 256), (8, 512) 

FORj^OTO KL-1 
i ^ j * 64 

IF k10] OR *no writemask* 

THEN 

IF (EVEX.b == 1) AND (SRC2 *ls memory*) 

THEN 

DEST[I+63:I] ^ SRC1 [i+63:i] BITWISE AND SRC2[63:0] 

ELSE 

DEST[I+63:I] ^ SRC1 [1+63:1] BITWISE AND SRC2[i+63:l] 

FI; 

ELSE 

IF *merglng-masking* ; merging-masking 

THEN *DEST[i+63:i] remains unchanged* 

ELSE ; zeroing-masking 

DEST[i+63:i] = 0 
FI; 

FI; 

ENDFOR 

DEST[MAX_VL-1:VL]^0 

VANDPD (VEX.256 encoded version) 

DEST[63:0] ^ SRC1[63:0] BITWISE AND SRC2[63:0] 

DEST[127:64] ^ SRC1 [127:64] BITWISE AND SRC2[127:64] 

DEST[191:128] ^ SRC1 [191:128] BITWISE AND SRC2[191:128] 

DEST[255:192] ^ SRC1 [255:192] BITWISE AND SRC2[255:192] 

DEST[MAX_VL-1:256]^0 

VANDPD (VEX.128 encoded version) 

DEST[63:0] ^ SRC1[63:0] BITWISE AND SRC2[63:0] 

DEST[127:64] ^ SRC1 [127:64] BITWISE AND SRC2[127:64] 

DEST[MAX_VL-1:128]^0 

ANDPD (128-bit Legacy SSE version) 

DEST[63:0] ^ DEST[63:0] BITWISE AND SRC[63:0] 

DEST[127:64] ^ DEST[127:64] BITWISE AND SRC[127:64] 

DEST[MAX_VL-1:128] (Unmodified) 

Intel C/C++ Compiler Intrinsic Equivalent 

VANDPD _m512d _mm512_and_pd (_m512d a, _m512d b); 

VANDPD_m512d_mm512_mask_and_pd (_m512d s,_mmaskB k,_m512d a,_m512d b); 

VANDPD_m512d_mm512_maskz_and_pd (_mmaskB k,_m512d a,_m512d b); 

VANDPD_m256d _mm256_mask_and_pd (_m256d s,_mmaskB k,_m256d a,_m256d b); 

VANDPD_m256d _mm256_maskz_and_pd (_mmaskB k,_m256d a,_m256d b); 

VANDPD_ml 28d _mm_mask_and_pd (_ml 28d s,_mmaskB k,_ml 28d a,_ml 28d b); 

VANDPD_ml 28d _mm_maskz_and_pd (_mmaskB k,_ml 28d a,_ml 28d b); 

VANDPD _m256d _mm256_and_pd (_m256d a, _m256d b); 

ANDPD_ml 28d _mm_and_pd (_ml 28d a,_ml 28d b); 

SIMD Floating-Point Exceptions 

None 
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Other Exceptions 

VEX-encoded instruction, see Exceptions Type 4. 
EVEX-encoded instruction, see Exceptions Type E4. 
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ANDPS—Bitwise Logical AND of Packed Single Precision Floating-Point Values 


Opcode/ 

Instruction 

Op/ 

En 

64/32 
bit Mode 
Support 

CPUID 

Feature 

Flag 

Description 

OF 54 /r 

ANDPS xmmi, xmm2/m128 

RM 

V/V 

SSE 

Return the bitwise logical AND of packed single-precision 
floating-point values in xmmi and xmm2/mem. 

VEX.NDS.128.0F 54 /r 

VANDPS xmmi ,xmm2, 
xmm3/m128 

RVM 

v/v 

AVX 

Return the bitwise logical AND of packed single-precision 
floating-point values in xmm2 and xmm3/mem. 

VEX.NDS.256.0F 54 /r 

VANDPS ymmi, ymm2, 
ymm3/m256 

RVM 

V/V 

AVX 

Return the bitwise logical AND of packed single-precision 
floating-point values in ymm2 and ymm3/mem. 

EVEX.NDS.128.0F.W0 54 /r 
VANDPS xmmi [kl }[z], xmm2, 
xmm3/m128/m32bcst 

FV 

v/v 

AVX512VL 

AVX512DQ 

Return the bitwise logical AND of packed single-precision 
floating-point values in xmm2 and xmm3/m128/m32bcst 
subject to writemask kl. 

EVEX.NDS.256.0F.W0 54 /r 
VANDPS ymmi {k1}{z}, ymm2, 
ymm3/m256/m32bcst 

FV 

v/v 

AVX512VL 

AVX512DQ 

Return the bitwise logical AND of packed single-precision 
floating-point values in ymm2 and ymm3/m256/m32bcst 
subject to writemask kl. 

EVEX.NDS.512.0F.W0 54 /r 
VANDPS zmmi {k1}{z}, zmm2, 
zmm3/m512/m32bcst 

FV 

v/v 

AVX512DQ 

Return the bitwise logical AND of packed single-precision 
floating-point values in zmm2 and zmm3/m512/m32bcst 
subject to writemask kl. 


Instruction Operand Encodinc 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RM 

ModRM:reg (r, w) 

ModRM:r/m (r) 

NA 

NA 

RVM 

ModRM:reg (w) 

VEX.vvvv 

ModRM:r/m (r) 

NA 

FV 

ModRM:reg (w) 

EVEX.vvvv 

ModRM:r/m (r) 

NA 


Description 

Performs a bitwise logical AND of the four, eight or sixteen packed single-precision floating-point values from the 
first source operand and the second source operand, and stores the result in the destination operand. 

EVEX encoded versions: The first source operand is a ZMM/YMM/XMM register. The second source operand can be 
a ZMM/YMM/XMM register, a 512/256/128-bit memory location, or a 512/256/128-bit vector broadcasted from a 
32-bit memory location. The destination operand is a ZMM/YMM/XMM register conditionally updated with 
writemask kl. 

VEX.256 encoded version: The first source operand is a YMM register. The second source operand is a YMM register 
or a 256-bit memory location. The destination operand is a YMM register. The upper bits (MAX_VL-1:256) of the 
corresponding ZMM register destination are zeroed. 

VEX.128 encoded version: The first source operand is an XMM register. The second source operand is an XMM 
register or 128-bit memory location. The destination operand is an XMM register. The upper bits (MAX_VL-1:128) 
of the corresponding ZMM register destination are zeroed. 

128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti¬ 
nation is not distinct from the first source XMM register and the upper bits (MAX_VL-1:128) of the corresponding 
ZMM register destination are unmodified. 
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Operation 

VANDPS (EVEX encoded versions) 

(KL, VL) = (4,128), (8, 256), (16, 512) 

FOR] ^0 TO KL-1 
i^j*32 

IF k1 [j] OR *no writemask* 

IF (EVEX.b == 1) AND (SRC2 *is memory*) 

THEN 

DEST[I+63:I] ^ SRC1 [1+31:1] BITWISE AND SRC2[31:0] 

ELSE 

DEST[i+31 :l] ^ SRC1 [1+31:1] BITWISE AND SRC2[I+31 :l] 
FI; 

ELSE 

IF *merglng-masklng* ; merglng-masklng 

THEN *DEST[I+31 :l] remains unchanged* 

ELSE ; zeroing-masking 

DEST[i+31:i]^0 
FI; 

FI; 

ENDFOR 

DEST[MAX_VL-1:VL]^0; 


VANDPS {VEX.256 encoded version) 

DEST[31:0] ^ SRC1 [31:0] BITWISE AND SRC2[31:0] 

DEST[63:32] ^ SRC1 [63:32] BITWISE AND SRC2[63:32] 
DEST[95:64] ^ SRC1 [95:64] BITWISE AND SRC2[95:64] 

DEST[127:96] ^ SRC1 [127:96] BITWISE AND SRC2[1 27:96] 
DEST[159:128] ^ SRC1 [159:128] BITWISE AND SRC2[159:128] 
DEST[191:160] ^ SRC1 [191:160] BITWISE AND SRC2[191:160] 
DEST[223:192] ^ SRC1 [223:192] BITWISE AND SRC2[223:192] 
DEST[255:224] ^ SRC1 [255:224] BITWISE AND SRC2[255:224]. 
DEST[MAX_VL-1:256]^0; 


VANDPS (VEX.128 encoded version) 

DEST[31:0] ^ SRC1 [31:0] BITWISE AND SRC2[31:0] 
DEST[63:32] ^ SRC1 [63:32] BITWISE AND SRC2[63:32] 
DEST[95:64] ^ SRC1 [95:64] BITWISE AND SRC2[95:64] 
DEST[127:96] ^ SRC1 [127:96] BITWISE AND SRC2[1 27:96] 
DEST[MAX_VL-1:128]^0; 


ANDPS (128-bit Legacy SSE version) 

DEST[31:0] ^ DEST[31:0] BITWISE AND SRC[31:0] 
DEST[63:32] ^ DEST[63:32] BITWISE AND SRC[63:32] 
DEST[95:64] ^ DEST[95:64] BITWISE AND SRC[95:64] 
DEST[127:96] ^ DEST[127:96] BITWISE AND SRC[127:96] 
DEST[MAX_VL-1:128] (Unmodified) 
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Intel C/C++ Compiler Intrinsic Equivalent 

VANDPS _m512 _mm512_and_ps (_m512 a,_m512 b); 

VANDPS_mSI 2 _mm512_mask_and_ps (_mSI 2 s,_mmaski 6 k,_mSI 2 a,_mSI 2 b); 

VANDPS_mSI 2 _mm512_maskz_and_ps (_mmaski 6 k,_mSI 2 a,_mSI 2 b); 

VANDPS_m256 _mm256_mask_and_ps (_m256 s,_mmaskS k,_m256 a,_m256 b); 

VANDPS_m256 _mm256_maskz_and_ps (_mmaskS k,_m256 a,_m256 b); 

VANDPS_ml 28 _mm_mask_and_ps (_ml 28 s,_mmask8 k,_ml 28 a,_ml 28 b); 

VANDPS_ml 28 _mm_maskz_and_ps (_mmask8 k,_ml 28 a,_ml 28 b); 

VANDPS _m256 _mm256_and_ps (_m256 a, _m256 b); 

ANDPS_ml 28 _mm_and_ps (_ml 28 a,_ml 28 b); 

SIMD Floating-Point Exceptions 

None 

Other Exceptions 

VEX-encoded instruction, see Exceptions Type 4. 

EVEX-encoded instruction, see Exceptions Type E4. 
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ANDNPD—Bitwise Logical AND NOT of Packed Double Precision Floating-Point Values 


Opcode/ 

Instruction 

Op/ 

En 

64/32 
bit Mode 
Support 

CPUID 

Feature 

Flag 

Description 

66 OF 55 /r 

ANDNPD xmmi, xmm2/m128 

RM 

V/V 

SSE2 

Return the bitwise logical AND NOT of packed double- 
precision floating-point values in xmmi and xmm2/mem. 

VEX.NDS.128.66.0F55/r 

VANDNPD xmmi, xmm2, 
xmm3/m128 

RVM 

v/v 

AVX 

Return the bitwise logical AND NOT of packed double- 
precision floating-point values in xmm2 and xmm3/mem. 

VEX.NDS.256.66.0F 55/r 

VANDNPD ymmi, ymm2, 
ymm3/m256 

RVM 

V/V 

AVX 

Return the bitwise logical AND NOT of packed double- 
precision floating-point values in ymm2 and ymm3/mem. 

EVEX.NDS.128.66.0F.W1 55/r 
VANDNPD xmmi {kl }[z}, xmm2, 
xmm3/m128/m64bcst 

FV 

v/v 

AVX512VL 

AVX512DQ 

Return the bitwise logical AND NOT of packed double- 
precision floating-point values in xmm2 and 
xmm3/m128/m64bcst subject to writemask kl. 

EVEX.NDS.256.66.0F.W1 55 /r 
VANDNPD ymmi {k1]{z}, ymm2, 
ymm3/m256/m64bcst 

FV 

v/v 

AVX512VL 

AVX512DQ 

Return the bitwise logical AND NOT of packed double- 
precision floating-point values in ymm2 and 
ymm3/m256/m64bcst subject to writemask kl. 

EVEX.NDS.512.66.0F.W1 55/r 
VANDNPD zmmi {k1]{z}, zmm2, 
zmm3/m512/m64bcst 

FV 

v/v 

AVX512DQ 

Return the bitwise logical AND NOT of packed double- 
precision floating-point values in zmm2 and 
zmm3/m512/m64bcst subject to writemask kl. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RM 

ModRM:reg (r, w) 

ModRM:r/m (r) 

NA 

NA 

RVM 

ModRM:reg (w) 

VEX.vvvv 

ModRM:r/m (r) 

NA 

FV 

ModRM:reg (w) 

EVEX.vvvv 

ModRM:r/m (r) 

NA 


Description 

Performs a bitwise logical AND NOT of the two, four or eight packed double-precision floating-point values from the 
first source operand and the second source operand, and stores the result in the destination operand. 

EVEX encoded versions: The first source operand is a ZMM/YMM/XMM register. The second source operand can be 
a ZMM/YMM/XMM register, a 512/256/128-bit memory location, ora 512/256/128-bit vector broadcasted from a 
64-bit memory location. The destination operand is a ZMM/YMM/XMM register conditionally updated with 
writemask kl. 

VEX.256 encoded version: The first source operand is a YMM register. The second source operand is a YMM register 
or a 256-bit memory location. The destination operand is a YMM register. The upper bits (MAX_VL-1:256) of the 
corresponding ZMM register destination are zeroed. 

VEX.128 encoded version: The first source operand is an XMM register. The second source operand is an XMM 
register or 128-bit memory location. The destination operand is an XMM register. The upper bits (MAX_VL-1:128) 
of the corresponding ZMM register destination are zeroed. 

128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti¬ 
nation is not distinct from the first source XMM register and the upper bits (MAX_VL-1:128) of the corresponding 
register destination are unmodified. 
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Operation 

VANDNPD (EVEX encoded versions) 

(KL, VL) = (2,128), (4, 256), (8, 512) 

FOR) ^0 TO KL-1 
i ^ j * 64 

IF k10] OR *no writemask* 

IF (EVEX.b == 1) AND (SRC2 *ls memory*) 

THEN 

DEST[I+63:I] ^ (NOT(SRC1 [i+63:l])) BITWISE AND SRC2[63:0] 

ELSE 

DEST[I+63:I] ^ (NOT(SRC1 [i+63:i])) BITWISE AND SRC2[l+63:i] 

FI; 

ELSE 

IF *merglng-masking* ; merging-masking 

THEN *DEST[i+63:i] remains unchanged* 

ELSE ; zeroing-masking 

DEST[i+63:i] = 0 
FI; 

FI; 

ENDFOR 

DEST[MAX_VL-1:VL]^0 

VANDNPD (VEX.256 encoded version) 

DEST[63:0] ^ (NOT(SRC1 [63:0])) BITWISE AND SRC2[63:0] 

DEST[127:64] ^ (NOT(SRC1 [127:64])) BITWISE AND SRC2[127:64] 

DEST[191:128] ^ (NOT(SRC1 [191:128])) BITWISE AND SRC2[191:128] 

DEST[255:192] ^ (NOT(SRC1 [255:192])) BITWISE AND SRC2[255:192] 

DEST[MAX_VL-1:256]^0 

VANDNPD (VEX.128 encoded version) 

DEST[63:0] ^ (NOT(SRC1 [63:0])) BITWISE AND SRC2[63:0] 

DEST[127:64] ^ (NOT(SRC1 [127:64])) BITWISE AND SRC2[127:64] 

DEST[MAX_VL-1:128]^0 

ANDNPD (128-bit Legacy SSE version) 

DEST[63:0] ^ (NOT(DEST[63:0])) BITWISE AND SRC[63:0] 

DEST[127:64] ^ (NOT(DEST[127:64])) BITWISE AND SRC[127:64] 

DEST[MAX_VL-1:128] (Unmodified) 

Intel C/C++ Compiler Intrinsic Equivaient 

VANDNPD _m512d _mm512_andnot_pd (_m512d a, _m512d b); 

VANDNPD_m512d_mm512_mask_andnot_pd (_m512d s,_mmaskB k,_m512d a,_m512d b); 

VANDNPD_m512d _mm512_maskz_andnot_pd (_mmaskB k,_m512d a,_m512d b); 

VANDNPD_m256d _mm256_mask_andnot_pd (_m256d s,_mmaskB k,_m256d a,_m256d b); 

VANDNPD_m256d _mm256_maskz_andnot_pd (_mmaskB k,_m256d a,_m256d b); 

VANDNPD_ml 28d _mm_mask_andnot_pd (_ml 28d s,_mmaskB k,_ml 28d a,_ml 28d b); 

VANDNPD_ml 28d _mm_maskz_andnot_pd (_mmaskB k,_ml 28d a,_ml 28d b); 

VANDNPD _m256d _mm256_andnot_pd (_m256d a, _m256d b); 

ANDNPD_ml 28d _mm_andnot_pd (_ml 28d a,_ml 28d b); 

SIMD Floating-Point Exceptions 

None 
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Other Exceptions 

VEX-encoded instruction, see Exceptions Type 4. 
EVEX-encoded instruction, see Exceptions Type E4. 
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ANDNPS—Bitwise Logical AND NOT of Packed Single Precision Floating-Point Values 


Opcode/ 

Instruction 

Op/ 

En 

64/32 
bit Mode 
Support 

CPUID 

Feature 

Flag 

Description 

OF 55 /r 

ANDNPS xmmi, xmm2/nn128 

RM 

V/V 

SSE 

Return the bitwise logical AND NOT of packed single-precision 
floating-point values in xmmi and xmm2/mem. 

VEX.NDS.128.0F55/r 

VANDNPS xmmi, xmm2, 
xmm3/m128 

RVM 

v/v 

AVX 

Return the bitwise logical AND NOT of packed single-precision 
floating-point values in xmm2 and xmm3/mem. 

VEX.NDS.256.0F 55 /r 

VANDNPS ymmi, ymm2, 
ymm3/m256 

RVM 

V/V 

AVX 

Return the bitwise logical AND NOT of packed single-precision 
floating-point values in ymm2 and ymm3/mem. 

EVEX.NDS.128.0F.W0 55 /r 
VANDNPS xmmi {k1}{z}, 
xmm2, xmm3/m128/m32bcst 

FV 

v/v 

AVX512VL 

AVX512DQ 

Return the bitwise logical AND of packed single-precision 
floating-point values in xmm2 and xmm3/m128/m32bcst 
subject to writemask kl. 

EVEX.NDS.256.0F.W0 55 /r 
VANDNPS ymmi [k1}[z], 
ymm2, ymm3/m256/m32bcst 

FV 

v/v 

AVX512VL 

AVX512DQ 

Return the bitwise logical AND of packed single-precision 
floating-point values in ymm2 and ymm3/m256/m32bcst 
subject to writemask kl. 

EVEX.NDS.512.0F.W0 55 /r 
VANDNPS zmmi {k1}[z}, 
zmm2, zmm3/m512/m32bcst 

FV 

v/v 

AVX512DQ 

Return the bitwise logical AND of packed single-precision 
floating-point values in zmm2 and zmm3/m512/m32bcst 
subject to writemask kl. 



nstruction Operand Encoding 

Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RM 

ModRM:reg (r, w) 

ModRM:r/m (r) 

NA 

NA 

RVM 

ModRM:reg (w) 

VEX.vvvv 

ModRM:r/m (r) 

NA 

FV 

ModRM:reg (w) 

EVEX.vvvv 

ModRM:r/m (r) 

NA 


Description 

Performs a bitwise logical AND NOT of the four, eight or sixteen packed single-precision floating-point values from 
the first source operand and the second source operand, and stores the result in the destination operand. 

EVEX encoded versions: The first source operand is a ZMM/YMM/XMM register. The second source operand can be 
a ZMM/YMM/XMM register, a 512/256/128-bit memory location, or a 512/256/128-bit vector broadcasted from a 
32-bit memory location. The destination operand is a ZMM/YMM/XMM register conditionally updated with 
writemask kl. 

VEX.256 encoded version: The first source operand is a YMM register. The second source operand is a YMM register 
or a 256-bit memory location. The destination operand is a YMM register. The upper bits (MAX_VL-1:256) of the 
corresponding ZMM register destination are zeroed. 

VEX.128 encoded version: The first source operand is an XMM register. The second source operand is an XMM 
register or 128-bit memory location. The destination operand is an XMM register. The upper bits (MAX_VL-1:128) 
of the corresponding ZMM register destination are zeroed. 

128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti¬ 
nation is not distinct from the first source XMM register and the upper bits (MAX_VL-1:128) of the corresponding 
ZMM register destination are unmodified. 
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Operation 

VANDNPS (GVGX encoded versions) 

(KL, VL) = (4,128), (8, 256), (16, 512) 

FOR] ^0 TO KL-1 
i^j*32 

IF k1 [j] OR *no writemask* 

IF (EVEX.b == 1) AND (SRC2 *is memory*) 

THEN 

DEST[I+31 :l] ^ (NOT(SRC1 [1+31 :i])) BITWISE AND SRC2[31:0] 

ELSE 

DEST[i+31 :l] ^ (NOT(SRC1 [1+31 :i])) BITWISE AND SRC2[I+31 :l] 
FI; 

ELSE 

IF *merglng-masklng* ; merglng-masklng 

THEN *DEST[I+31 :l] remains unchanged* 

ELSE ; zeroing-masking 

DEST[i+31:i] = 0 
FI; 

FI; 

ENDFOR 

DEST[MAX_VL-1:VL]^0 


VANDNPS (VEX.256 encoded version) 

DEST[31:0] ^ (NOT(SRC1 [31:0])) BITWISE AND SRC2[31:0] 

DEST[63:32] ^ (NOT(SRC1 [63:32])) BITWISE AND SRC2[63:32] 
DEST[95:64] ^ (NOT(SRC1 [95:64])) BITWISE AND SRC2[95:64] 

DEST[127:96] ^ (NOT(SRC1 [127:96])) BITWISE AND SRC2[1 27:96] 
DEST[159:128] ^ (NOT(SRC1 [159:128])) BITWISE AND SRC2[159:128] 
DEST[191:160] ^ (NOT(SRC1 [191:160])) BITWISE AND SRC2[191:160] 
DEST[223:192] ^ (NOT(SRC1 [223:192])) BITWISE AND SRC2[223:192] 
DEST[255:224] ^ (NOT(SRC1 [255:224])) BITWISE AND SRC2[255:224]. 
DEST[MAX_VL-1:256]^0 


VANDNPS (VGX.128 encoded version) 

DEST[31:0] ^ (N0T(SRC1 [31:0])) BITWISE AND SRC2[31:0] 
DEST[63:32] ^ (N0T(SRC1 [63:32])) BITWISE AND SRC2[63:32] 
DEST[95:64] ^ (N0T(SRC1 [95:64])) BITWISE AND SRC2[95:64] 
DEST[127:96] ^ (N0T(SRC1 [127:96])) BITWISE AND SRC2[1 27:96] 
DEST[MAX_VL-1:128]^0 


ANDNPS (128-bit Legacy SSG version) 

DEST[31:0] ^ (N0T(DEST[31:0])) BITWISE AND SRC[31:0] 
DEST[63:32] ^ (N0T(DEST[63:32])) BITWISE AND SRC[63:32] 
DEST[95:64] ^ (N0T(DEST[95:64])) BITWISE AND SRC[95:64] 
DEST[127:96] ^ (N0T(DEST[1 27:96])) BITWISE AND SRC[127:96] 
DEST[MAX_VL-1:128] (Unmodified) 
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Intel C/C++ Compiler Intrinsic Equivalent 

VANDNPS _m512 _mm512_andnot_ps (_m512 a, _m512 b); 

VANDNPS_mSI 2 _mm512_mask_andnot_ps (_mSI 2 s,_mmaski 6 k,_mSI 2 a,_mSI 2 b); 

VANDNPS_mSI 2 _mm512_maskz_andnot_ps (_mmaski 6 k,_mSI 2 a,_mSI 2 b); 

VANDNPS_m256 _mm256_mask_andnot_ps (_m256 s,_mmaskS k,_m256 a,_m256 b); 

VANDNPS_m256 _mm256_maskz_andnot_ps (_mmaskS k,_m256 a,_m256 b); 

VANDNPS_ml 28 _mm_mask_andnot_ps (_ml 28 s,_mmask8 k,_ml 28 a,_ml 28 b); 

VANDNPS_ml 28 _mm_maskz_andnot_ps (_mmask8 k,_ml 28 a,_ml 28 b); 

VANDNPS_m256 _mm256_andnot_ps (_m256 a,_m256 b); 

ANDNPS_ml 28 _mm_andnot_ps (_ml 28 a,_ml 28 b); 

SIMD Floating-Point Exceptions 

None 

Other Exceptions 

VEX-encoded instruction, see Exceptions Type 4. 

EVEX-encoded instruction, see Exceptions Type E4. 


ANDNPS—Bitwise Logical AND NOT of Packed Single Precision Floating-Point Values 


Vol. 2A 3-75 


INSTRUCTION SET REFERENCE, A-L 


ARPL—Adjust RPL Field of Segment Selector 


Opcode 

Instruction 

Op/ 

En 

64-bit 

Mode 

Compat/ 
Leg Mode 

Description 

63/r 

ARPL r/m 7 6, rl 6 

NP 

N. E. 

Valid 

Adjust RPL of r/m 7 6 to not less than RPL of 
rl6. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand Z 

Operand 3 

Operand 4 

NP 

ModRM:r/m (w) 

ModRM:reg (r) 

NA 

NA 


Description 

Compares the RPL fields of two segment selectors. The first operand (the destination operand) contains one 
segment selector and the second operand (source operand) contains the other. (The RPL field is located in bits 0 
and 1 of each operand.) If the RPL field of the destination operand is less than the RPL field of the source operand, 
the ZF flag is set and the RPL field of the destination operand is increased to match that of the source operand. 
Otherwise, the ZF flag is cleared and no change is made to the destination operand. (The destination operand can 
be a word register or a memory location; the source operand must be a word register.) 

The ARPL instruction is provided for use by operating-system procedures (however, it can also be used by applica¬ 
tions). It is generally used to adjust the RPL of a segment selector that has been passed to the operating system 
by an application program to match the privilege level of the application program. Flere the segment selector 
passed to the operating system is placed in the destination operand and segment selector for the application 
program's code segment is placed in the source operand. (The RPL field in the source operand represents the priv¬ 
ilege level of the application program.) Execution of the ARPL instruction then ensures that the RPL of the segment 
selector received by the operating system is no lower (does not have a higher privilege) than the privilege level of 
the application program (the segment selector for the application program's code segment can be read from the 
stack following a procedure call). 

This instruction executes as described in compatibility mode and legacy mode. It is not encodable in 64-bit mode. 

See "Checking Caller Access Privileges" in Chapter 3, "Protected-Mode Memory Management," of the I ntel® 64 and 
IA-32 Architectures Software Developer's Manual, Volume 3A, for more information about the use of this instruc¬ 
tion. 

Operation 

IF 64-BIT MODE 
THEN 

See MOVSXD; 

ELSE 

IF DEST[RPL] < SRC[RPL] 

THEN 

ZF^ 1; 

DEST[RPL] ^ SRC[RPL]; 

ELSE 

ZF^O; 

FI; 

FI; 

Flags Affected 

The ZF flag is set to 1 if the RPL field of the destination operand is less than that of the source operand; otherwise, 
it is set to 0. 
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Protected Mode Exceptions 


#GP(0) 


#SS(0) 

#PF(fault-code) 

#AC(0) 

#UD 


If the destination is located in a non-writable segment. 

If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 
If the DS, ES, FS, or GS register is used to access memory and it contains a NULL segment 
selector. 

If a memory operand effective address is outside the SS segment limit. 

If a page fault occurs. 

If alignment checking is enabled and an unaligned memory reference is made while the 
current privilege level is 3. 

If the LOCK prefix is used. 


Real-Address Mode Exceptions 

#UD The ARPL instruction is not recognized in real-address mode. 

If the LOCK prefix is used. 


\/irtual-8086 Mode Exceptions 

#UD The ARPL instruction is not recognized in virtual-8086 mode. 

If the LOCK prefix is used. 


Compatibility Mode Exceptions 

Same exceptions as in protected mode. 


64-Bit Mode Exceptions 

Not applicable. 
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BLENDPD — Blend Packed Double Precision Floating-Point Values 


Opcode/ 

Instruction 

Op/ 

En 

64/32-bit 

Mode 

CPUID 

Feature 

Flag 

Description 

66 OF 3A OD /r ib 

BLENDPD xmmi, xmmZ/mlZ8, imm8 

RMI 

V/V 

SSE4_1 

Select packed DP-FP values from xmmi and 
xmm2’/m728from mask specified in immS 
and store the values into xmmi. 

VEX.NDS.128.66.0F3A.WIG OD /r ib 

VBLENDPD xmmi, xmm2, xmm3/m128, imm8 

RVMI 

v/v 

AVX 

Select packed double-precision floating-point 
Values from xmm2 and xmm3/m128 from 
mask in immS and store the values in xmmi. 

VEX.NDS.256.66.0F3A.WIG OD /r ib 

VBLENDPD ymmi, ymmZ, ymm3/m256, immS 

RVMI 

V/V 

AVX 

Select packed double-precision floating-point 
Values from ymm2 and ymm3/m256 from 
mask in immB and store the values in ymmi. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RMI 

ModRM:reg (r, w) 

ModRM:r/m (r) 

imm8 

NA 

RVMI 

ModRM:reg (w) 

VEX.vvvv (r) 

ModRM:r/m (r) 

imm8[3:0] 


Description 

Double-precision floating-point values from the second source operand (third operand) are conditionally merged 
with values from the first source operand (second operand) and written to the destination operand (first operand). 
The immediate bits [3:0] determine whether the corresponding double-precision floating-point value in the desti¬ 
nation is copied from the second source or first source. If a bit in the mask, corresponding to a word, is "1", then 
the double-precision floating-point value in the second source operand is copied, else the value in the first source 
operand is copied. 

128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti¬ 
nation is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding 
VMM register destination are unmodified. 

VEX.128 encoded version: the first source operand is an XMM register. The second source operand is an XMM 
register or 128-bit memory location. The destination operand is an XMM register. The upper bits (VLMAX-1:128) of 
the corresponding VMM register destination are zeroed. 

VEX.256 encoded version: The first source operand is a VMM register. The second source operand can be a VMM 
register or a 256-bit memory location. The destination operand is a VMM register. 

Operation 

BLENDPD (128-bit Legacy SSE version) 

IF (IMM8[0] = 0)THEN DEST[63:0] ^ DEST[63:0] 

ELSE DEST [63:0] ^ SRC[63:0] FI 
IF (IMM8[1 ] = 0) THEN DEST[127:64] ^ DEST[127:64] 

ELSE DEST [127:64] ^ SRC[127:64] FI 
DEST[VLMAX-1:128] (Unmodified) 

VBLENDPD (VEX.128 encoded version) 

IF (IMM8[0] = 0)THEN DEST[63:0] ^ SRC1 [63:0] 

ELSE DEST [63:0] ^ SRC2[63:0] FI 
IF (IMM8[1 ] = 0) THEN DEST[127:64] ^ SRC1 [1 27:64] 

ELSE DEST [127:64] ^ SRC2[127:64] FI 
DEST[VLMAX-1:128]^0 
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VBLENDPD (VEX.256 encoded version) 

IF (IMM8[0] = 0)THEN DEST[63:0] ^ SRC1 [63:0] 

ELSE DEST [63:0] ^ SRC2[63:0] FI 
IF (IMM8[1 ] = 0) THEN DEST[127:64] ^ SRC1 [127:64] 

ELSE DEST [127:64] ^ SRC2[127:64] FI 
IF (IMM8[2] = 0) THEN DEST[191:1 28] ^ SRC1 [191:128] 

ELSE DEST [191:128] ^ SRC2[191:128] FI 
IF (IMM8[3] = 0) THEN DEST[255:192] ^ SRC1 [255:192] 

ELSE DEST [255:192] ^ SRC2[255:192] FI 

Intel C/C++ Compiler Intrinsic Equivalent 

BLENDPD: _ml 28d _mm_blend_pd (_ml 28d v1,_ml 28d v2, const Int mask); 

VBLENDPD: _m256d _mm256_blend_pd (_m256d a,_m256d b, const int mask); 

SIMD Floating-Point Exceptions 

None 

Other Exceptions 

See Exceptions Type 4. 
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BEXTR — Bit Field Extract 


Opcode/Instruction 

Op/ 

En 

64/32 

-bit 

Mode 

CPUID 

Feature 

Flag 

Description 

VEX.NBS.LZ.0F38.W0 F7 /r 
BEXTR r32a, r/m32, r32b 

RMV 

V/V 

BMI1 

Contiguous bitwise extract from r/m32 using r32b as control; store 
result in r32a. 

VEX.NBS.LZ.0F38.W1 F7 /r 
BEXTR r64a, r/m64, r64b 

RMV 

V/N.E. 

BMII 

Contiguous bitwise extract from r/m64 using r64b as control; store 
result in r64a 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RMV 

ModRM:reg (w) 

ModRM:r/m (r) 

VEX.vvvv (r) 

NA 


Description 

Extracts contiguous bits from the first source operand (the second operand) using an index value and length value 
specified in the second source operand (the third operand). Bit 7:0 of the second source operand specifies the 
starting bit position of bit extraction. A START value exceeding the operand size will not extract any bits from the 
second source operand. Bit 15:8 of the second source operand specifies the maximum number of bits (LENGTH) 
beginning at the START position to extract. Only bit positions up to (OperandSize -1) of the first source operand are 
extracted. The extracted bits are written to the destination register, starting from the least significant bit. All higher 
order bits in the destination operand (starting at bit position LENGTH) are zeroed. The destination register is 
cleared if no bits are extracted. 

This instruction is not supported in real mode and virtual-8086 mode. The operand size is always 32 bits if not in 
64-bit mode. In 64-bit mode operand size 64 requires VEX.Wl. VEX.Wl is ignored in non-64-bit modes. An 
attempt to execute this instruction with VEX.L not equal to 0 will cause #UD. 

Operation 

START ^ SRC2[7:0]; 

LEN ^ SRC2[15:8]; 

TEMP ^ ZER0_EXTEND_T0_512 (SRC1 ); 

BEST ^ ZERO_EXTEND(TEMP[START+LEN -1: START]); 

ZF ^ (BEST = 0); 

Flags Affected 

ZF is updated based on the result. AF, SF, and PF are undefined. All other flags are cleared. 

Intel C/C++ Compiler Intrinsic Equivalent 

BEXTR: unsigned_int32 _bextr_u32(unslgned_Int32 src, unsigned_int32 start, unsigned_int32 len); 

BEXTR: unsigned_int64 _bextr_u64(unsigned_int64 src, unsigned_int32 start, unsigned_int32 len); 

SIMD Floating-Point Exceptions 

None 

Other Exceptions 

See Section 2.5.1, "Exception Conditions for VEX-Encoded GPR Instructions", Table 2-29; additionally 
#UD IfVEX.W=l. 
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BLENDPS — Blend Packed Single Precision Floating-Point Values 


Opcode/ 

Instruction 

Op/ 

En 

64/32-bit 

Mode 

CPUID 

Feature 

Flag 

Description 

66 OF 3A OC /r lb 

BLENDPS xmml, xmm2/ml28, imm8 

RMI 

V/V 

SSE4_1 

Select packed single precision floating-point 
values from xmml and xmm2/m128from 
mask specified in /mmS and store the values 
into xmml. 

VEX.NDS.128.66.0F3A.WIG OC /r lb 

VBLENDPS xmml, xmm2, xmm3/m128, imm8 

RVMI 

v/v 

AVX 

Select packed single-precision floating-point 
values from xmm2 and xmm3/m128 from 
mask in immS and store the values in xmml. 

VEX.NDS.256.66.0F3A.WIG OC /r lb 

VBLENDPS ymmi, ymm2, ymm3/m256, imm8 

RVMI 

V/V 

AVX 

Select packed single-precision floating-point 
values from ymm2 and ymm3/m256 from 
mask in immS and store the values in ymmi. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RMI 

ModRM:reg (r, w) 

ModRM:r/m (r) 

ImmS 

NA 

RVMI 

ModRM:reg (w) 

VEX.vvvv (r) 

ModRM:r/m (r) 

ImmS 


Description 

Packed single-precision floating-point values from the second source operand (third operand) are conditionally 
merged with values from the first source operand (second operand) and written to the destination operand (first 
operand). The immediate bits [7:0] determine whether the corresponding single precision floating-point value in 
the destination is copied from the second source or first source. If a bit in the mask, corresponding to a word, is 
"1", then the single-precision floating-point value in the second source operand is copied, else the value in the first 
source operand is copied. 

128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti¬ 
nation is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding 
VMM register destination are unmodified. 

VEX. 128 encoded version: The first source operand an XMM register. The second source operand is an XMM register 
or 128-bit memory location. The destination operand is an XMM register. The upper bits (VLMAX-1:128) of the 
corresponding VMM register destination are zeroed. 

VEX.256 encoded version: The first source operand is a VMM register. The second source operand can be a VMM 
register or a 256-bit memory location. The destination operand is a VMM register. 

Operation 

BLENDPS (128-bit Legacy SSE version) 

IF (IMM8[0] = 0) THEN DEST[31:0] ^DEST[31:0] 

ELSE DEST [31:0] ^ SRC[31:0] FI 
IF (IMM8[1] = 0) THEN DEST[63:32] ^ DEST[63:32] 

ELSE DEST [63:32] ^ SRC[63:32] FI 
IF (IMM8[2] = 0) THEN DEST[95:64] ^ DEST[95:64] 

ELSE DEST [95:64] ^ SRC[95:64] FI 
IF (IMM8[3] = 0) THEN DEST[127:96] ^ DEST[1 27:96] 

ELSE DEST [127:96] ^ SRC[127:96] FI 
DEST[VLMAX-1:128] (Unmodified) 
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VBLENDPS (VEX.128 encoded version) 

IF (IMM8[0] = 0) THEN DEST[31:0] ^SRCI [31:0] 

ELSE DEST [31:0] ^ SRC2[31:0] FI 
IF (IMM8[1 ] = 0) THEN DEST[63:32] ^ SRC1 [63:32] 
ELSE DEST [63:32] ^ SRC2[63:32] FI 
IF (IMM8[2] = 0) THEN DEST[95:64] ^ SRC1 [95:64] 
ELSE DEST [95:64] ^ SRC2[95:64] FI 
IF (IMM8[3] = 0) THEN DEST[127:96] ^ SRC1 [127:96] 
ELSE DEST [127:96] ^ SRC2[127:96] FI 
DEST[VLMAX-1:128]^0 


VBLENDPS (VEX.256 encoded version) 

IF (IMM8[0] = 0) THEN DEST[31:0] ^SRCI [31:0] 

ELSE DEST [31:0] ^ SRC2[31:0] FI 
IF (IMM8[1 ] = 0) THEN DEST[63:32] ^ SRC1 [63:32] 

ELSE DEST [63:32] ^ SRC2[63:32] FI 
IF (IMM8[2] = 0) THEN DEST[95:64] ^ SRC1 [95:64] 

ELSE DEST [95:64] ^ SRC2[95:64] FI 
IF (IMM8[3] = 0) THEN DEST[127:96] ^ SRC1 [127:96] 

ELSE DEST [127:96] ^ SRC2[127:96] FI 
IF (IMM8[4] = 0) THEN DEST[159:128] ^ SRC1 [159:128] 

ELSE DEST [159:128] ^ SRC2[159:128] FI 
IF (IMM8[5] = 0) THEN DEST[191:160] ^ SRC1 [191:160] 

ELSE DEST [191:160] ^ SRC2[191:160] FI 
IF (IMM8[6] = 0) THEN DEST[223:192] ^ SRC1 [223:192] 

ELSE DEST [223:192] ^ SRC2[223:192] FI 
IF (IMM8[7] = 0) THEN DEST[255:224] ^ SRC1 [255:224] 

ELSE DEST [255:224] ^ SRC2[255:224] FI. 

Intel C/C++ Compiler Intrinsic Equivalent 

BLENDPS: _ml 28 _mm_blend_ps ( ml 28 v1, ml 28 v2, const Int mask); 

VBLENDPS: _m256 _mm256_blend_ps ( m256 a, m256 b, const Int mask); 

SIMD Floating-Point Exceptions 

None 

Other Exceptions 

See Exceptions Type 4. 
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BLENDVPD — Variable Blend Packed Double Precision Floating 

-Point Values 

Opcode/ 

Op/ 

64/32-bit 

CPUID 

Description 

Instruction 

En 

Mode 

Feature 

Flag 


66 OF 38 15/r 

BLENDVPD xmm7, xmmZ/m 7^8, <XMM0> 

RMO 

V/V 

SSE4_1 

Select packed DP FP values from xmml and 
xmm2 from mask specified in XMMO and 
store the values in xmml. 

VEX.NDS.128.66.0F3A.W0 4B /r /is4 

VBLENDVPD xmmi, xmm2, xmm3/m128, xmm4 

RVMR 

v/v 

AVX 

Conditionally copy double-precision floating¬ 
point values from xmm2 or xmm3/m128 to 
xmml, based on mask bits in the mask 
operand, xmm4. 

VEX.NDS.256.66.0F3A.W0 4B /r /is4 

VBLENDVPD ymmi, ymm2, ymm3/m256, ymm4 

RVMR 

V/V 

AVX 

Conditionally copy double-precision floating¬ 
point values from ymm2 or ymm3/m256 to 
ymmi, based on mask bits in the mask 
operand, ymm4. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RMO 

ModRM:reg (r, w) 

ModRM:r/m (r) 

implicit XMMO 

NA 

RVMR 

ModRM:reg (w) 

VEX.vvvv (r) 

ModRM:r/m (r) 

imm8[7:4] 


Description 

Conditionally copy each quadword data element of double-precision floating-point value from the second source 
operand and the first source operand depending on mask bits defined in the mask register operand. The mask bits 
are the most significant bit in each quadword element of the mask register. 

Each quadword element of the destination operand is copied from: 

• the corresponding quadword element in the second source operand, if a mask bit is "1"; or 

• the corresponding quadword element in the first source operand, if a mask bit is "0" 

The register assignment of the implicit mask operand for BLENDVPD is defined to be the architectural register 
XMMO. 

128-bit Legacy SSE version: The first source operand and the destination operand is the same. Bits (VLMAX-1:128) 
of the corresponding VMM destination register remain unchanged. The mask register operand is implicitly defined 
to be the architectural register XMMO. An attempt to execute BLENDVPD with a VEX prefix will cause #UD. 

VEX.128 encoded version: The first source operand and the destination operand are XMM registers. The second 
source operand is an XMM register or 128-bit memory location. The mask operand is the third source register, and 
encoded in bits[7:4] of the immediate byte(imm8). The bits[3:0] of imm8 are ignored. In 32-bit mode, imm8[7] is 
ignored. The upper bits (VLMAX-1:128) of the corresponding VMM register (destination register) are zeroed. 
VEX.W must be 0, otherwise, the instruction will #UD. 

VEX.256 encoded version: The first source operand and destination operand are VMM registers. The second source 
operand can be a VMM register or a 256-bit memory location. The mask operand is the third source register, and 
encoded in bits[7:4] of the immediate byte(imm8). The bits[3:0] of imm8 are ignored. In 32-bit mode, imm8[7] is 
ignored. VEX.W must be 0, otherwise, the instruction will #UD. 

VBLENDVPD permits the mask to be any XMM or VMM register. In contrast, BLENDVPD treats XMMO implicitly as the 
mask and do not support non-destructive destination operation. 
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Operation 

BLENDVPD (128-bit Legacy SSE version) 

MASK ^ XMMO 

IF (MASK[63] = 0) THEN DEST[63:0] ^ DEST[63:0] 

ELSE DEST [63:0] ^ SRC[63:0] FI 
IF (MASK[1 27] = 0) THEN DEST[127:64] ^ DEST[127:64] 
ELSE DEST [127:64] ^ SRC[127:64] FI 
DEST[VLMAX-1:128] (Unmodified) 


VBLENDVPD (VEX.128 encoded version) 

MASK ^ SRC3 

IF (MASK[63] = 0) THEN DEST[63:0] ^ SRC1 [63:0] 

ELSE DEST [63:0] ^ SRC2[63:0] FI 
IF (MASK[1 27] = 0) THEN DEST[127:64] ^ SRC1 [127:64] 
ELSE DEST [127:64] ^ SRC2[127:64] FI 
DEST[VLMAX-1:128]^0 


VBLENDVPD (VEX.256 encoded version) 

MASK ^ SRC3 

IF (MASK[63] = 0) THEN DEST[63:0] ^ SRC1 [63:0] 

ELSE DEST [63:0] ^ SRC2[63:0] FI 
IF (MASK[1 27] = 0) THEN DEST[127:64] ^ SRC1 [127:64] 

ELSE DEST [127:64] ^ SRC2[127:64] FI 
IF (MASK[191 ] = 0) THEN DEST[191:128] ^ SRC1 [191:1 28] 

ELSE DEST [191:128] ^ SRC2[191:128] FI 
IF (MASK[255] = 0) THEN DEST[255:192] ^ SRC1 [255:192] 

ELSE DEST [255:192] ^ SRC2[255:192] FI 

Intel C/C++ Compiler Intrinsic Equivalent 

BLENDVPD: _m128d _mm_blendv_pd(_m128d v1, _m128d v2, _m128d v3); 

VBLENDVPD: _m128 _mm_blendv_pd (_m128d a,_m128d b,_m128d mask); 
VBLENDVPD: _m256 _mm256_blendv_pd (_m256d a, _m256d b, _m256d mask); 

SIMD Floating-Point Exceptions 

None 

Other Exceptions 

See Exceptions Type 4; additionally 
#UD IfVEX.W=l. 
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BLENDVPS — Variable Blend Packed Single Precision Floating-Point Values 


Opcode/ 

Instruction 

Op/ 

En 

64/32-bit 

Mode 

CPUID 

Feature 

Flag 

Description 

66 OF 38 14/r 

BLENDVPS xmmh xmmZ/mlZQ, <XMM0> 

RMO 

V/V 

SSE4_1 

Select packed single precision floating-point 
values from xmmi and xmmZ/mlZ8 from 
mask specified in XMMO and store the values 
into xmmi. 

VEX.NDS.128.66.0F3A.W0 4A /r /is4 

VBLENDVPS xmmi, xmm2, xmm3/m128, xmm4 

RVMR 

v/v 

AVX 

Conditionally copy single-precision floating¬ 
point values from xmm2 or xmm3/m128 to 
xmmi, based on mask bits in the specified 
mask operand, xmm4. 

VEX.NDS.256.66.0F3A.W0 4A /r /is4 

VBLENDVPS ymmi, ymmZ, ymm3/m256, ymm4 

RVMR 

V/V 

AVX 

Conditionally copy single-precision floating¬ 
point values from ymm2 or ymm3/m256 to 
ymmi, based on mask bits in the specified 
mask register, ymm4. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RMO 

ModRM:reg (r, w) 

ModRM:r/m (r) 

implicit XMMO 

NA 

RVMR 

ModRM:reg (w) 

VEX.vvvv (r) 

ModRM:r/m (r) 

imm8[7:4] 


Description 

Conditionally copy each dword data element of single-precision floating-point value from the second source 
operand and the first source operand depending on mask bits defined in the mask register operand. The mask bits 
are the most significant bit in each dword element of the mask register. 

Each quadword element of the destination operand is copied from: 

• the corresponding dword element in the second source operand, if a mask bit is "1"; or 

• the corresponding dword element in the first source operand, if a mask bit is "0" 

The register assignment of the implicit mask operand for BLENDVPS is defined to be the architectural register 
XMMO. 

128-bit Legacy SSE version: The first source operand and the destination operand is the same. Bits (VLMAX-1:128) 
of the corresponding VMM destination register remain unchanged. The mask register operand is implicitly defined 
to be the architectural register XMMO. An attempt to execute BLENDVPS with a VEX prefix will cause #UD. 

VEX.128 encoded version: The first source operand and the destination operand are XMM registers. The second 
source operand is an XMM register or 128-bit memory location. The mask operand is the third source register, and 
encoded in bits[7:4] of the immediate byte(imm8). The bits[3:0] of imm8 are ignored. In 32-bit mode, imm8[7] is 
ignored. The upper bits (VLMAX-1:128) of the corresponding VMM register (destination register) are zeroed. 
VEX.W must be 0, otherwise, the instruction will #UD. 

VEX.256 encoded version: The first source operand and destination operand are VMM registers. The second source 
operand can be a VMM register or a 256-bit memory location. The mask operand is the third source register, and 
encoded in bits[7:4] of the immediate byte(imm8). The bits[3:0] of imm8 are ignored. In 32-bit mode, imm8[7] is 
ignored. VEX.W must be 0, otherwise, the instruction will #UD. 

VBLENDVPS permits the mask to be any XMM or VMM register. In contrast, BLENDVPS treats XMMO implicitly as the 
mask and do not support non-destructive destination operation. 
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Operation 

BLENDVPS (1 Z8-bit Legacy SSE version) 

MASK ^ XMMO 

IF (MASK[31 ] = 0) THEN DEST[31:0] ^ DEST[31:0] 

ELSE DEST [31:0] ^ SRC[31:0] FI 
IF (MASK[63] = 0) THEN DEST[63:32] ^ DEST[63:32] 
ELSE DEST [63:32] ^ SRC[63:32] FI 
IF (MASK[95] = 0) THEN DEST[95:64] ^ DEST[95:64] 
ELSE DEST [95:64] ^ SRC[95:64] FI 
IF (MASK[1 27] = 0) THEN DEST[127:96] ^ DEST[127:96] 
ELSE DEST [127:96] ^ SRC[127:96] FI 
DEST[VLMAX-1:128] (Unmodified) 


VBLENDVPS (VEX.128 encoded version) 

MASK ^ SRC3 

IF (MASK[31] = 0) THEN DEST[31:0] ^ SRC1 [31:0] 

ELSE DEST [31:0] ^ SRC2[31:0] FI 
IF (MASK[63] = 0) THEN DEST[63:32] ^ SRC1 [63:32] 
ELSE DEST [63:32] ^ SRC2[63:32] FI 
IF (MASK[95] = 0) THEN DEST[95:64] ^ SRC1 [95:64] 
ELSE DEST [95:64] ^ SRC2[95:64] FI 
IF (MASK[1 27] = 0) THEN DEST[1 27:96] ^ SRC1 [127:96] 
ELSE DEST [127:96] ^ SRC2[127:96] FI 
DEST[VLMAX-1:128]^0 


VBLENDVPS (VEX.256 encoded version) 

MASK ^ SRC3 

IF (MASK[31] = 0) THEN DEST[31:0] ^ SRC1 [31:0] 

ELSE DEST [31:0] ^ SRC2[31:0] FI 
IF (MASK[63] = 0) THEN DEST[63:32] ^ SRC1 [63:32] 

ELSE DEST [63:32] ^ SRC2[63:32] FI 
IF (MASK[95] = 0) THEN DEST[95:64] ^ SRC1 [95:64] 

ELSE DEST [95:64] ^ SRC2[95:64] FI 
IF (MASK[1 27] = 0) THEN DEST[1 27:96] ^ SRC1 [127:96] 

ELSE DEST [127:96] ^ SRC2[127:96] FI 
IF (MASK[1 59] = 0) THEN DEST[1 59:128] ^ SRC1 [159:1 28] 

ELSE DEST [159:128] ^ SRC2[1 59:128] FI 
IF (MASK[191 ] = 0) THEN DEST[191:160] ^ SRC1 [191:160] 

ELSE DEST [191:160] ^ SRC2[191:160] FI 
IF (MASK[223] = 0) THEN DEST[223:192] ^ SRC1 [223:192] 

ELSE DEST [223:192] ^ SRC2[223:192] FI 
IF (MASK[255] = 0) THEN DEST[255:224] ^ SRC1 [255:224] 

ELSE DEST [255:224] ^ SRC2[255:224] FI 

Intel C/C++ Compiler Intrinsic Equivalent 

BLENDVPS: _m128 _mm_blendv_ps(_m128 v1, _m128 v2, _m128 v3); 

VBLENDVPS: _m128 _mm_blendv_ps (_m128 a, _m128 b, _m128 mask); 

VBLENDVPS: _m256 _mm256_blendv_ps (_m256 a_m256 b_m256 mask); 

SIMD Floating-Point Exceptions 

None 
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Other Exceptions 

See Exceptions Type 4; additionally 
#UD IfVEX.W=l. 
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BLSI — Extract Lowest Set Isolated Bit 


Opcode/Instruction 

Op/ 

En 

64/32 

-bit 

Mode 

CPUID 

Feature 

Flag 

Description 

VEX.NDD.LZ.0F38.W0 F3 /3 
BLSI r32, r/m32 

VM 

V/V 

BMI1 

Extract lowest set bit from r/m32 and set that bit in r32. 

VEX.NDD.LZ.0F38.W1 F3 /3 
BLSI r64, r/m64 

VM 

V/N.E. 

BMI1 

Extract lowest set bit from r/m64, and set that bit in r64. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

VM 

VEX.vvvv (w) 

ModRM:r/m (r) 

NA 

NA 


Description 

Extracts the lowest set bit from the source operand and set the corresponding bit in the destination register. All 
other bits in the destination operand are zeroed. If no bits are set in the source operand, BLSI sets all the bits in 
the destination to 0 and sets ZF and CF. 

This instruction is not supported in real mode and virtual-8086 mode. The operand size is always 32 bits if not in 
64-bit mode. In 64-bit mode operand size 64 requires VEX.Wl. VEX.Wl is ignored in non-64-bit modes. An 
attempt to execute this instruction with VEX.L not equal to 0 will cause #UD. 

Operation 

temp ^ (-SRC) bitwiseAND (SRC); 

SF <- temp[OperandSize -1 ]; 

ZF (temp = 0); 

IF SRC = 0 
CF^O; 

ELSE 
CF^ 1; 

FI 

BEST temp; 

Flags Affected 

ZF and SF are updated based on the result. CF is set if the source is not zero. OF flags are cleared. AF and PF 
flags are undefined. 

Intel C/C++ Compiler Intrinsic Equivalent 

BLSI: unsigned Int32 _blsi_u32(unslgned Int32 src); 

BLSI: unsigned Int64 _blsl_u64(unslgned Int64 src); 

SIMD Floating-Point Exceptions 

None 

Other Exceptions 

See Section 2.5.1, "Exception Conditions for VEX-Encoded GPR Instructions", Table 2-29; additionally 
#UD IfVEX.W=l. 
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BLSMSK — Get Mask Up to Lowest Set Bit 


Opcode/Instruction 

Op/ 

En 

64/32 

-bit 

Mode 

CPUID 

Feature 

Flag 

Description 

VEX.NDD.LZ.0F38.W0 F3 /2 
BLSMSK r32, r/m32 

VM 

V/V 

BMI1 

Set all lower bits in r32 to "1" starting from bit 0 to lowest set bit in 
r/m32. 

VEX.NDD.LZ.0F38.W1 F3 /2 
BLSMSK r64, r/m64 

VM 

V/N.E. 

BMI1 

Set all lower bits in r64 to "1" starting from bit 0 to lowest set bit in 
r/m64. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

VM 

VEX.vvvv (w) 

ModRM:r/m (r) 

NA 

NA 


Description 

Sets all the lower bits of the destination operand to "1" up to and including lowest set bit (=1) in the source 
operand. If source operand is zero, BLSMSK sets all bits of the destination operand to 1 and also sets CF to 1. 

This instruction is not supported in real mode and virtual-8086 mode. The operand size is always 32 bits if not in 
64-bit mode. In 64-bit mode operand size 64 requires VEX.Wl. VEX.Wl is ignored in non-64-bit modes. An 
attempt to execute this instruction with VEX.L not equal to 0 will cause #UD. 

Operation 

temp ^ (SRC-1)X0R (SRC); 

SF temp[OperandSize -1 ]; 

ZF^O; 

IF SRC = 0 
CF^ 1; 

ELSE 

CF^O; 

FI 

DEST <- temp; 

Flags Affected 

SF is updated based on the result. CF is set if the source if zero. ZF and OF flags are cleared. AF and PF flag are 
undefined. 

Intel C/C++ Compiler Intrinsic Equivalent 

BLSMSK: unsigned int32 _blsmsk_u32(unsigned int32 src); 

BLSMSK: unsigned int64 _blsmsk_u64(unsigned int64 src); 

SIMD Floating-Point Exceptions 

None 

Other Exceptions 

See Section 2.5.1, "Exception Conditions for VEX-Encoded GPR Instructions", Table 2-29; additionally 
#UD IfVEX.W=l. 
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BLSR — Reset Lowest Set Bit 


Opcode/Instruction 

Op/ 

En 

64/32 

-bit 

Mode 

CPUID 

Feature 

Flag 

Description 

VEX.NDD.LZ.0F38.W0 F3 /I 
BLSR r32, r/m32 

VM 

V/V 

BMI1 

Reset lowest set bit of r/m32, keep all other bits of r/m32 and write 
result to r32. 

VEX.NDD.LZ.0F38.W1 F3 /I 
BLSR r64, r/m64 

VM 

V/N.E. 

BMI1 

Reset lowest set bit of r/m64, keep all other bits of r/m64 and write 
result to r64. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

VM 

VEX.vvvv (w) 

ModRM:r/m (r) 

NA 

NA 


Description 

Copies all bits from the source operand to the destination operand and resets (=0) the bit position in the destina¬ 
tion operand that corresponds to the lowest set bit of the source operand. If the source operand is zero BLSR sets 
CF. 

This instruction is not supported in real mode and virtual-8086 mode. The operand size is always 32 bits if not in 
64-bit mode. In 64-bit mode operand size 64 requires VEX.Wl. VEX.Wl is ignored in non-64-bit modes. An 
attempt to execute this instruction with VEX.L not equal to 0 will cause #UD. 

Operation 

temp ^ (SRC-1) bitwiseAND (SRC); 

SF <- temp[OperandSize -1 ]; 

ZF (temp = 0); 

IF SRC = 0 
CF^ 1; 

ELSE 

CF^O; 

FI 

BEST temp; 

Flags Affected 

ZF and SF flags are updated based on the result. CF is set if the source is zero. OF flag is cleared. AF and PF flags 
are undefined. 

Intel C/C++ Compiler Intrinsic Equivalent 

BLSR: unsigned int32 _blsr_u32(unsigned int32 src); 

BLSR: unsigned int64 _blsr_u64(unsigned int64 src); 

SIMD Floating-Point Exceptions 

None 

Other Exceptions 

See Section 2.5.1, "Exception Conditions for VEX-Encoded GPR Instructions", Table 2-29; additionally 
#UD IfVEX.W=l. 
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BNDCL—Check Lower Bound 


Opcode/ 

Instruction 

Op/En 

64/3Z 
bit Mode 
Support 

CPUID 

Feature 

Flag 

Description 

F3 0F1A/r 

BNDCL bnd, r/m32 

RM 

NE/V 

MPX 

Generate a #BR if the address in r/m32 is lower than the lower 
bound in bnd.LB. 

F3 0F1A/r 

BNDCL bnd, r/m64 

RM 

V/NE 

MPX 

Generate a #BR if the address in r/m64 is lower than the lower 
bound in bnd.LB. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

RM 

ModRM:reg (w) 

ModRM:r/m (r) 

NA 


Description 

Compare the address in the second operand with the lower bound in bnd. The second operand can be either a 
register or memory operand. If the address is lower than the lower bound in bnd.LB, it will set BNDSTATUS to OlH 
and signal a #BR exception. 

This instruction does not cause any memory access, and does not read or write any flags. 

Operation 

BNDCL BND, reg 

IF reg < BND.LB Then 
BNDSTATUS^ 01H; 

#BR; 

FI; 

BNDCL BND, mem 

TEMP ^ LEA(mem); 

IF TEMP < BND.LB Then 
BNDSTATUS^ 01H; 

#BR; 

FI; 

Intel C/C++ Compiler Intrinsic Equivalent 

BNDCL void _bnd_chk_ptr_lbounds(const void *q) 

Flags Affected 

None 

Protected Mode Exceptions 

#BR If lower bound check fails. 

#UD If the LOCK prefix is used. 

If ModRM.r/m encodes BND4-BND7 when Intel MPX is enabled. 

If 67FI prefix is not used and CS.D=0. 

If 67FI prefix is used and CS.D=1. 
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Real-Address Mode Exceptions 

#BR If lower bound check fails. 

#UD If the LOCK prefix is used. 

If ModRM.r/m encodes BND4-BND7 when Intel MPX is enabled. 

If 16-bit addressing is used. 

Virtual-SOSe Mode Exceptions 

#BR If lower bound check fails. 

#UD If the LOCK prefix is used. 

If ModRM.r/m encodes BND4-BND7 when Intel MPX is enabled. 

If 16-bit addressing is used. 

Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

e4-Bit Mode Exceptions 

#UD If ModRM.r/m and REX encodes BND4-BND15 when Intel MPX is enabled. 

Same exceptions as in protected mode. 
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BNDCU/BNDCN-Check Upper Bound 


Opcode/ 

Instruction 

Op/En 

64/32 
bit Mode 
Support 

CPUID 

Feature 

Flag 

Description 

F2 OF 1A/r 

BNDCU bnd, r/m32 

RM 

NE/V 

MPX 

Generate a #BR If the address In r/m32 is higher than the upper 
bound in bnd.UB (bnb.UB in I's complement form). 

F2 OF 1A/r 

BNDCU bnd, r/m64 

RM 

V/NE 

MPX 

Generate a #BR if the address in r/m64 is higher than the upper 
bound in bnd.UB (bnb.UB in I's complement form). 

F2 OF 1B /r 

BNDCN bnd, r/m32 

RM 

NE/V 

MPX 

Generate a #BR if the address in r/m32 is higher than the upper 
bound in bnd.UB (bnb.UB not in I's complement form). 

F2 OF 1B /r 

BNDCN bnd, r/m64 

RM 

V/NE 

MPX 

Generate a #BR if the address in r/m64 is higher than the upper 
bound in bnd.UB (bnb.UB not in I's complement form). 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

RM 

ModRM:reg (w) 

ModRM:r/m (r) 

NA 


Description 

Compare the address in the second operand with the upper bound in bnd. The second operand can be either a 
register or a memory operand. If the address is higher than the upper bound in bnd.UB, it will set BNDSTATUS to 
OlH and signal a #BR exception. 

BNDCU perform I's complement operation on the upper bound of bnd first before proceeding with address compar¬ 
ison. BNDCN perform address comparison directly using the upper bound in bnd that is already reverted out of I's 
complement form. 

This instruction does not cause any memory access, and does not read or write any flags. 

Effective address computation of m32/64 has identical behavior to LEA 

Operation 

BNDCU BND, reg 

IF reg > NOT(BND.UB) Then 
BNDSTATUS^ 01H; 

#BR; 

FI; 

BNDCU BND, mem 

TEMP ^ LEA(mem); 

IF TEMP > NOT(BND.UB) Then 
BNDSTATUS^ 01H; 

#BR; 

FI; 

BNDCN BND, reg 

IF reg > BND.UB Then 
BNDSTATUS^ 01H; 

#BR; 

FI; 
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BNDCN BND, mem 

TEMP ^ LEA(mem); 

IF TEMP > BND.UB Then 
BNDSTATUS^OIH; 

#BR; 

FI; 

Intel C/C++ Compiler Intrinsic Equivalent 

BNDCU .void _bnd_chl<_ptr_ubounds(const void *q) 

Flags Affected 

None 

Protected Mode Exceptions 

#BR If upper bound check fails. 

#UD If the LOCK prefix is used. 

If ModRM.r/m encodes BND4-BND7 when Intel MPX is enabled. 

If 67FI prefix is not used and CS.D=0. 

If 67FI prefix is used and CS.D=1. 

Real-Address Mode Exceptions 

#BR If upper bound check fails. 

#UD If the LOCK prefix is used. 

If ModRM.r/m encodes BND4-BND7 when Intel MPX is enabled. 

If 16-bit addressing is used. 

Virtual-SOSe Mode Exceptions 

#BR If upper bound check fails. 

#UD If the LOCK prefix is used. 

If ModRM.r/m encodes BND4-BND7 when Intel MPX is enabled. 

If 16-bit addressing is used. 

Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

e4-Bit Mode Exceptions 

#UD If ModRM.r/m and REX encodes BND4-BND15 when Intel MPX is enabled. 

Same exceptions as in protected mode. 
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BNDLDX—Load Extended Bounds Using Address Translation 


Opcode/ 

Instruction 

Op/En 

64/32 
bit Mode 
Support 

CPUID 

Feature 

Flag 

Description 

OF 1A/r 

BNDLDX bnd, mib 

RM 

V/V 

MPX 

Load the bounds stored in a bound table entry (BTE) into bnd with 
address translation using the base of mib and conditional on the 
index of mib matching the pointer value in the BTE. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

RM 

ModRM:reg (w) 

SIB.base (r): Address of pointer 
SIB.index(r) 

NA 


Description 

BNDLDX uses the linear address constructed from the base register and displacement of the SIB-addressing form 
of the memory operand (mib) to perform address translation to access a bound table entry and conditionally load 
the bounds in the BTE to the destination. The destination register is updated with the bounds in the BTE, if the 
content of the index register of mib matches the pointer value stored in the BTE. 

If the pointer value comparison fails, the destination is updated with INIT bounds (lb = 0x0, ub = 0x0) (note: as 
articulated earlier, the upper bound is represented using I's complement, therefore, the 0x0 value of upper bound 
allows for access to full memory). 

This instruction does not cause memory access to the linear address of mib nor the effective address referenced by 
the base, and does not read or write any flags. 

Segment overrides apply to the linear address computation with the base of mib, and are used during address 
translation to generate the address of the bound table entry. By default, the address of the BTE is assumed to be 
linear address. There are no segmentation checks performed on the base of mib. 

The base of mib will not be checked for canonical address violation as it does not access memory. 

Any encoding of this instruction that does not specify base or index register will treat those registers as zero 
(constant). The reg-reg form of this instruction will remain a NOP. 

The scale field of the SIB byte has no effect on these instructions and is ignored. 

The bound register may be partially updated on memory faults. The order in which memory operands are loaded is 
implementation specific. 

Operation 

base <- mib.SIB.base ? mib.SIB.base + Disp: 0; 
ptr_value <- mib.SIB.index ? mib.SIB.index: 0; 

Outside 64-bit mode 

A_BDE[31:0] ^ (Zero_extend32(base[31:12]« 2) + (BNDCFG[31:12] «12 ); 

A_BT[31:0] ^ LoadFrom(A_BDE); 

IF A_BT[0] equal 0 Then 

BNDSTATUS ^ A_BDE|02H; 

#BR; 

FI; 

A_BTE[31:0] ^ (Zero_extend32(base[11:2]« 4) + (A_BT[31:2] « 2 ); 

Tempjb[31:0] ^ LoadFrom(A_BTE); 

Temp_ub[31:0] <- LoadFrom(A_BTE + 4); 

Temp_ptr[31:0] <- LoadFrom(A_BTE + 8); 

IF Temp_ptr equal ptr_value Then 
BND.LB ^ Tempjb; 

BND.UB <- Tempjb; 
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ELSE 

BND.LB ^ 0; 

BND.UB ^ 0; 

FI; 

In 64-bit mode 

A_BDE[63:0] ^ (Zero_extend64(base[47+MAWA:20] « 3) + (BNDCFG[63:20] «12 );^ 

A_BT[63:0] ^ LoadFrom(A_BDE); 

IF A_BT[0] equal 0 Then 

BNDSTATUS ^ A_BDE|02H; 

#BR; 

FI; 

A_BTE[63:0] ^ (Zero_extend64(base[19:3]« 5) + (A_BT[63:3]« 3 ); 

Tempjb[63:0] ^ LoadFrom(A_BTE); 

Temp_ub[63:0] <- LoadFrom(A_BTE + 8); 

Temp_ptr[63:0] <- LoadFrom(A_BTE + 16); 

IF Temp_ptr equal ptr_value Then 
BND.LB ^ Tempjb; 

BND.UB <- Tempjb; 

ELSE 

BND.LB ^ 0; 

BND.UB ^ 0; 

FI; 

Intel C/C++ Compiler Intrinsic Equivalent 

BNDLDX: Generated by compiler as needed. 

Flags Affected 

None 

Protected Mode Exceptions 

#BR If the bound directory entry is invalid. 

#UD If the LOCK prefix is used. 

If ModRM.r/m encodes BND4-BND7 when Intel MPX is enabled. 

If 67FI prefix is not used and CS.D=0. 

If 67FI prefix is used and CS.D=1. 

#GP(0) If a destination effective address of the Bound Table entry is outside the DS segment limit. 

If DS register contains a NULL segment selector. 

#PF(fault code) If a page fault occurs. 

Real-Address Mode Exceptions 

#UD If the LOCK prefix is used. 

If ModRM.r/m encodes BND4-BND7 when Intel MPX is enabled. 

If 16-bit addressing is used. 

#GP(0) If a destination effective address of the Bound Table entry is outside the DS segment limit. 


1. If CPL < 3, the supervisor MAWA (MAWAS) Is used; this value Is 0. If CPL = 3, the user MAWA (MAWAU) is used; this value is enumer¬ 
ated in CPUID.(EAX=07H,ECX=0H):ECX.MAWAU[bits 21:17]. See Section 17.3.1 of Inter 64 and IA-32 Architectures Software Devel¬ 
oper's Manual, Volume I. 
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\/irtual-8086 Mode Exceptions 


#UD 

If the LOCK prefix is used. 

If ModRM.r/m encodes BND4-BND7 when Intel MPX is enabled. 

If 16-bit addressing is used. 

#GP(0) 

#PF(fault code) 

If a destination effective address of the Bound Table entry is outside the DS segment limit. 

If a page fault occurs. 


Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

64-Bit Mode Exceptions 

#BR If the bound directory entry is invalid 


#UD 

If ModRM is RIP relative. 

If the LOCK prefix is used. 

If ModRM.r/m and REX encodes BND4-BND15 when Intel MPX is enabled. 

#GP(0) 

#PF(fault code) 

If the memory address (A_BDE or A_BTE) is in a non-canonical form. 

If a page fault occurs. 
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BNDMK—Make Bounds 


Opcode/ 

Instruction 

Op/En 

64/3Z 
bit Mode 
Support 

CPUID 

Feature 

Flag 

Description 

F3 OF 1B /r 

BNDMK bnd, m32 

RM 

NE/V 

MPX 

Make lower and upper bounds from m32 and store them in bnd. 

F3 OF 1B /r 

BNDMK bnd, m64 

RM 

V/NE 

MPX 

Make lower and upper bounds from m64 and store them in bnd. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

RM 

ModRM:reg (w) 

ModRM:r/m (r) 

NA 


Description 

Makes bounds from the second operand and stores the lower and upper bounds in the bound register bnd. The 
second operand must be a memory operand. The content of the base register from the memory operand is stored 
in the lower bound bnd.LB. The I's complement of the effective address of m32/m64 is stored in the upper bound 
b.UB. Computation of m32/m64 has identical behavior to LEA. 

This instruction does not cause any memory access, and does not read or write any flags. 

If the instruction did not specify base register, the lower bound will be zero. The reg-reg form of this instruction 
retains legacy behavior (NOP). 

RIP relative instruction in 64-bit will #UD. 

Operation 

BND.LB ^ SRCMEM.base; 

IF 64-blt mode Then 

BND.UB ^ N0T(LEA.64_blts(SRCMEM)); 

ELSE 

BND.UB ^ Zero_Extend.64_blts(N0T(LEA.32_bits(SRCMEM))); 

FI; 

Intel C/C++ Compiler Intrinsic Equivalent 

BNDMKvoid * _bnd_set_ptr_bounds(const void * q, size_t size); 

Flags Affected 

None 

Protected Mode Exceptions 

#UD If ModRM is RIP relative. 

If the LOCK prefix is used. 

If ModRM.r/m encodes BND4-BND7 when Intel MPX is enabled. 

If 67FI prefix is not used and CS.D=0. 

If 67FI prefix is used and CS.D=1. 

Real-Address Mode Exceptions 

#UD If ModRM is RIP relative. 

If the LOCK prefix is used. 

If ModRM.r/m encodes BND4-BND7 when Intel MPX is enabled. 

If 16-bit addressing is used. 
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\/irtual-8086 Mode Exceptions 

#UD If ModRM is RIP relative. 

If the LOCK prefix is used. 

If ModRM.r/m encodes BND4-BND7 when Intel MPX is enabled. 

If 16-bit addressing is used. 

Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

64-Bit Mode Exceptions 

#UD If ModRM.r/m and REX encodes BND4-BND15 when Intel MPX is enabled. 

#SS(0) If the memory address referencing the SS segment is in a non-canonical form. 

#GP(0) If the memory address is in a non-canonical form. 

Same exceptions as in protected mode. 
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BNDMOV—Move Bounds 


Opcode/ 

Instruction 

Op/En 

64/32 
bit Mode 
Support 

CPUID 

Feature 

Flag 

Description 

66 OF lA/r 

BNDMOV bnd1,bnd2/m64 

RM 

NE/V 

MPX 

Move lower and upper bound from bnd2/m64 to bound register 
bndl. 

66 OF lA/r 

BNDMOV bnd1,bnd2/m128 

RM 

V/NE 

MPX 

Move lower and upper bound from bndZ/ml 28 to bound register 
bndl. 

66 OF 1B /r 

BNDMOV bnd1/m64, bndZ 

MR 

NE/V 

MPX 

Move lower and upper bound from bnd2 to bnd1/m64. 

66 OF 1B /r 

BNDMOV bnd1/m128, bndZ 

MR 

V/NE 

MPX 

Move lower and upper bound from bnd2 to bound register 
bndl/ml 28. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

RM 

ModRM:reg (w) 

ModRM:r/m (r) 

NA 

MR 

ModRM:r/m (w) 

ModRM:reg (r) 

NA 


Description 

BNDMOV moves a pair of lower and upper bound values from the source operand (the second operand) to the 
destination (the first operand). Each operation is 128-bit move. The exceptions are same as the MOV instruction. 
The memory format for loading/store bounds in 64-bit mode is shown in Figure 3-5. 



This instruction does not change flags. 

Operation 

BNDMOV register to register 

DEST.LB ^ SRC.LB; 

DEST.UB ^ SRC.UB; 
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BNDMOV from memory 

IF 64-blt mode THEN 

DEST.LB ^ LOAD_QWORD(SRC); 

DEST.UB ^ LOAD_QWORD(SRC+8); 

ELSE 

DEST.LB ^ LOAD_DWORD_ZERO_EXT(SRC); 
DEST.UB ^ L0AD_DW0RD_ZER0_EXT(SRC+4); 


BNDMOV to memory 

IF 64-bit mode THEN 

DEST[63:0] ^ SRC.LB; 

DEST[127:64] ^ SRC.UB; 

ELSE 

DEST[31:0]^SRC.LB; 

DEST[63:32] ^ SRC.UB; 

FI; 

Intel C/C++ Compiler Intrinsic Equivalent 

BNDMOV void * _bnd_copy_ptr_bounds(const void *q, const void *r) 

Flags Affected 

None 


Protected Mode Exceptions 


#UD 


#SS(0) 

#GP(0) 


#AC(0) 

#PF(fault code) 


If the LOCK prefix is used but the destination is not a memory operand. 

If ModRM.r/m encodes BND4-BND7 when Intel MPX is enabled. 

If 67H prefix is not used and CS.D=0. 

If 67H prefix is used and CS.D=1. 

If the memory operand effective address is outside the SS segment limit. 

If the memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

If the destination operand points to a non-writable segment 

If the DS, ES, FS, or GS segment register contains a NULL segment selector. 

If alignment checking is enabled and an unaligned memory reference is made while GPL is 3. 
If a page fault occurs. 


Real-Address Mode Exceptions 

#UD If the LOCK prefix is used but the destination is not a memory operand. 

If ModRM.r/m encodes BND4-BND7 when Intel MPX is enabled. 

If 16-bit addressing is used. 

#GP(0) If the memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

#SS If the memory operand effective address is outside the SS segment limit. 
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Virtual-SOSe Mode Exceptions 

#UD If the LOCK prefix is used but the destination is not a memory operand 


#GP(0) 

#SS(0) 

#AC(0) 

#PF(fault code) 

If ModRM.r/m encodes BND4-BND7 when Intel MPX is enabled. 

If 16-bit addressing is used. 

If the memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 
If the memory operand effective address is outside the SS segment limit. 

If alignment checking is enabled and an unaligned memory reference is made while CPL is 3, 
If a page fault occurs. 


Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

e4-Bit Mode Exceptions 

#UD If the LOCK prefix is used but the destination is not a memory operand 


#SS(0) 

#GP(0) 

#AC(0) 

#PF(fault code) 

If ModRM.r/m and REX encodes BND4-BND15 when Intel MPX is enabled. 

If the memory address referencing the SS segment is in a non-canonical form. 

If the memory address is in a non-canonical form. 

If alignment checking is enabled and an unaligned memory reference is made while CPL is 3, 
If a page fault occurs. 
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BNDSTX—Store Extended Bounds Using Address Translation 


Opcode/ 

Instruction 

Op/En 

64/32 
bit Mode 
Support 

CPUID 

Feature 

Flag 

Description 

OF 1B /r 

BNDSTX mib, bnd 

MR 

V/V 

MPX 

Store the bounds in bnd and the pointer value in the index regis¬ 
ter of mib to a bound table entry (BTE) with address translation 
using the base of mib. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

MR 

SIB.base (r): Address of pointer 
SIB.index(r) 

ModRM:reg (r) 

NA 


Description 

BNDSTX uses the linear address constructed from the displacement and base register of the SIB-addressing form 
of the memory operand (mib) to perform address translation to store to a bound table entry. The bounds in the 
source operand bnd are written to the lower and upper bounds in the BTE. The content of the index register of mib 
is written to the pointer value field in the BTE. 

This instruction does not cause memory access to the linear address of mib nor the effective address referenced by 
the base, and does not read or write any flags. 

Segment overrides apply to the linear address computation with the base of mib, and are used during address 
translation to generate the address of the bound table entry. By default, the address of the BTE is assumed to be 
linear address. There are no segmentation checks performed on the base of mib. 

The base of mib will not be checked for canonical address violation as it does not access memory. 

Any encoding of this instruction that does not specify base or index register will treat those registers as zero 
(constant). The reg-reg form of this instruction will remain a NOP. 

The scale field of the SIB byte has no effect on these instructions and is ignored. 

The bound register may be partially updated on memory faults. The order in which memory operands are loaded is 
implementation specific. 

Operation 

base <- mib.SIB.base ? mib.SIB.base + Disp: 0; 
ptr_value <- mib.SIB.index ? mib.SIB.index: 0; 

Outside 64-bit mode 

A_BDE[31:0] ^ (Zero_extend32(base[31:12]« 2) + (BNDCFG[31:12] «12 ); 

A_BT[31:0] ^ LoadFrom(A_BDE); 

IF A_BT[0] equal 0 Then 

BNDSTATUS ^ A_BDE|02H; 

#BR; 

FI; 

A_DEST[31:0] ^ (Zero_extend32(base[11:2]« 4) + (A_BT[31:2] « 2 ); // address of Bound table entry 
A_DEST[8][31:0] ^ ptr_value; 

A_DEST[0][31:0]^BND.LB; 

A_DEST[4][31:0] ^BND.UB; 
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In 64-bit mode 

A_BDE[63:0] ^ (Zero_extend64(base[47+MAWA:20] « 3) + (BNDCFG[63:20] «12 );^ 

A_BT[63:0] ^ LoadFrom(A_BDE); 

IF A_BT[0] equal 0 Then 

BNDSTATUS ^ A_BDE | 02H; 

#BR; 

FI; 

A_DEST[63:0] ^ (Zero_extend64(base[19:3]« 5) + (A_BT[63:3]« 3 ); // address of Bound table entry 
A_DEST[16][63:0] ^ ptr_value; 

A_DEST[0][63:0] ^ BND.LB; 

A_DEST[B][63:0] ^ BND.UB; 

Intel C/C++ Compiler Intrinsic Equivalent 

BNDSTX: _bnd_store_ptr_bounds(const void **ptr_addr, const void *ptr_val); 

Flags Affected 

None 

Protected Mode Exceptions 

#BR If the bound directory entry is invalid. 

#UD If the LOCK prefix is used. 

If ModRM.r/m encodes BND4-BND7 when Intel MPX is enabled. 

If 67FI prefix is not used and CS.D=0. 

If 67FI prefix is used and CS.D=1. 

#GP(0) If a destination effective address of the Bound Table entry is outside the DS segment limit. 

If DS register contains a NULL segment selector. 

If the destination operand points to a non-writable segment 
#PF(fault code) If a page fault occurs. 

Real-Address Mode Exceptions 

#UD If the LOCK prefix is used. 

If ModRM.r/m encodes BND4-BND7 when Intel MPX is enabled. 

If 16-bit addressing is used. 

#GP(0) If a destination effective address of the Bound Table entry is outside the DS segment limit. 

Virtual-SOSe Mode Exceptions 

#UD If the LOCK prefix is used. 

If ModRM.r/m encodes BND4-BND7 when Intel MPX is enabled. 

If 16-bit addressing is used. 

#GP(0) If a destination effective address of the Bound Table entry is outside the DS segment limit. 

#PF(fault code) If a page fault occurs. 

Compatibility Mode Exceptions 

Same exceptions as in protected mode. 


1. If CPL < 3, the supervisor MAWA (MAWAS) Is used; this value Is 0. If CPL = 3, the user MAWA (MAWAU) is used; this value is enumer¬ 
ated in CPUID.(EAX=07H,ECX=0H):ECX.MAWAU[bits 21:17]. See Section 17.3.1 of Inter 64 and IA-32 Architectures Software Devel¬ 
oper's Manual, Volume I. 
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64-Bit Mode Exceptions 

#BR If the bound directory entry is invalid 


#UD 

If ModRM is RIP relative. 

If the LOCK prefix is used. 

If ModRM.r/m and REX encodes BND4-BND15 when Intel MPX is enabled 

#GP(0) 

If the memory address (A_BDE or A_BTE) is in a non-canonical form. 

If the destination operand points to a non-writable segment 

#PF(fault code) 

If a page fault occurs. 
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BOUND—Check Array Index Against Bounds 


Opcode 

Instruction 

Op/ 

En 

64-bit 

Mode 

Compat/ 
Leg Mode 

Description 

62 /r 

BOUND r16,m16&16 

RM 

Invalid 

Valid 

Check if rl6 (array index) is within bounds 
specified by ml6&16. 

62 /r 

BOUND r32, m32&32 

RM 

Invalid 

Valid 

Check if r32 (array index) is within bounds 
specified by m32&32. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RM 

ModRM:reg (r) 

ModRM:r/m (r) 

NA 

NA 


Description 

BOUND determines if the first operand (array index) is within the bounds of an array specified the second operand 
(bounds operand). The array index is a signed integer located in a register. The bounds operand is a memory loca¬ 
tion that contains a pair of signed doubleword-integers (when the operand-size attribute is 32) or a pair of signed 
word-integers (when the operand-size attribute is 16). The first doubleword (or word) is the lower bound of the 
array and the second doubleword (or word) is the upper bound of the array. The array index must be greater than 
or equal to the lower bound and less than or equal to the upper bound plus the operand size in bytes. If the index 
is not within bounds, a BOUND range exceeded exception (#BR) is signaled. When this exception is generated, the 
saved return instruction pointer points to the BOUND instruction. 

The bounds limit data structure (two words or doublewords containing the lower and upper limits of the array) is 
usually placed just before the array itself, making the limits addressable via a constant offset from the beginning of 
the array. Because the address of the array already will be present in a register, this practice avoids extra bus cycles 
to obtain the effective address of the array bounds. 

This instruction executes as described in compatibility mode and legacy mode. It is not valid in 64-bit mode. 

Operation 

IF 64blt Mode 
THEN 
#UD; 

ELSE 

IF (Arraylndex < LowerBound OR Arraylndex > UpperBound) 

(* Below lower bound or above upper bound *) 

THEN #BR; FI; 

FI; 

Flags Affected 

None. 

Protected Mode Exceptions 

#BR If the bounds test fails. 

#UD If second operand is not a memory location. 

If the LOCK prefix is used. 

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

If the DS, ES, FS, or GS register contains a NULL segment selector. 

#SS(0) If a memory operand effective address is outside the SS segment limit. 

#PF(fault-code) If a page fault occurs. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the 

current privilege level is 3. 
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Real-Address Mode Exceptions 

#BR If the bounds test fails. 

#UD If second operand is not a memory location. 

If the LOCK prefix is used. 

FS, or GS segment limit, 
limit. 


#GP If a memory operand effective address is outside the CS, DS, ES, 

#SS If a memory operand effective address is outside the SS segment 


\/irtual-8086 Mode 

#BR 

#UD 

#GP(0) 

#SS(0) 

#PF(fault-code) 

#AC(0) 


Exceptions 

If the bounds test fails. 

If second operand is not a memory location. 

If the LOCK prefix is used. 

If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 
If a memory operand effective address is outside the SS segment limit. 

If a page fault occurs. 

If alignment checking is enabled and an unaligned memory reference is made. 


Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

e4-Bit Mode Exceptions 

#UD If in 64-bit mode. 
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BSF—Bit Scan Forward 


Opcode 

Instruction 

Op/ 

En 

64-bit 

Mode 

Compat/ 
Leg Mode 

Description 

OF BC /r 

BSF rl6, r/m 7 6 

RM 

Valid 

Valid 

Bit scan forward on r/m 7 6. 

OF BC /r 

BSF r32, r/m32 

RM 

Valid 

Valid 

Bit scan forward on r/m32. 

REX.W + OF BC /r 

BSF r64, r/m64 

RM 

Valid 

N.E. 

Bit scan forward on r/m64. 


Instruction Operand 

Encoding 

Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RM 

ModRM:reg (w) 

ModRM:r/m (r) 

NA 

NA 


Description 

Searches the source operand (second operand) for the least significant set bit (1 bit). If a least significant 1 bit is 
found, its bit index is stored in the destination operand (first operand). The source operand can be a register or a 
memory location; the destination operand is a register. The bit index is an unsigned offset from bit 0 of the source 
operand. If the content of the source operand is 0, the content of the destination operand is undefined. 

In 64-bit mode, the instruction's default operation size is 32 bits. Using a REX prefix in the form of REX.R permits 
access to additional registers (R8-R15). Using a REX prefix in the form of REX.W promotes operation to 64 bits. See 
the summary chart at the beginning of this section for encoding data and limits. 

Operation 

IF SRC = 0 
THEN 

ZF^ 1; 

DEST is undefined; 

ELSE 

ZF^O; 
temp 0; 

WHILE Bit(SRC, temp) = 0 
DO 

temp <- temp + 1; 

OD; 

DEST <- temp; 

FI; 

Flags Affected 

The ZF flag is set to 1 if all the source operand is 0; otherwise, the ZF flag is cleared. The CF, OF, SF, AF, and PF, flags 
are undefined. 


Protected Mode Exceptions 


#GP(0) 

#SS(0) 

#PF(fault-code) 

#AC(0) 

#UD 


If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 
If the DS, ES, FS, or GS register contains a NULL segment selector. 

If a memory operand effective address is outside the SS segment limit. 

If a page fault occurs. 

If alignment checking is enabled and an unaligned memory reference is made while the 
current privilege level is 3. 

If the LOCK prefix is used. 


3-108 Vol. 2A 


BSF—Bit Scan Forward 




















INSTRUCTION SET REFERENCE, A-L 


Real-Address Mode Exceptions 

#GP If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

#SS If a memory operand effective address is outside the SS segment limit. 

#UD If the LOCK prefix is used. 

Virtual-SOSe Mode Exceptions 

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

#SS(0) If a memory operand effective address is outside the SS segment limit. 

#PF(fault-code) If a page fault occurs. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made. 

#UD If the LOCK prefix is used. 

Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

64-Bit Mode Exceptions 

#SS(0) If a memory address referencing the SS segment is in a non-canonical form. 

#GP(0) If the memory address is in a non-canonical form. 

#PF(fault-code) If a page fault occurs. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the 

current privilege level is 3. 

#UD If the LOCK prefix is used. 
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BSR—Bit Scan Reverse 


Opcode 

Instruction 

Op/ 

En 

64-bit 

Mode 

Compat/ 
Leg Mode 

Description 

OF BD /r 

BSR r16, r/m16 

RM 

Valid 

Valid 

Bit scan reverse on r/m 16. 

OF BD /r 

BSR r32, r/m32 

RM 

Valid 

Valid 

Bit scan reverse on r/m32. 

REX.W + OF BD /r 

BSR r64, r/m64 

RM 

Valid 

N.E. 

Bit scan reverse on r/m64. 


Instruction Operand 

Encoding 

Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RM 

ModRM:reg (w) 

ModRM:r/m (r) 

NA 

NA 


Description 

Searches the source operand (second operand) for the most significant set bit (1 bit). If a most significant 1 bit is 
found, its bit index is stored in the destination operand (first operand). The source operand can be a register or a 
memory location; the destination operand is a register. The bit index is an unsigned offset from bit 0 of the source 
operand. If the content source operand is 0, the content of the destination operand is undefined. 

In 64-bit mode, the instruction's default operation size is 32 bits. Using a REX prefix in the form of REX.R permits 
access to additional registers (R8-R15). Using a REX prefix in the form of REX.W promotes operation to 64 bits. See 
the summary chart at the beginning of this section for encoding data and limits. 

Operation 

IF SRC = 0 
THEN 

ZF^ 1; 

DEST is undefined; 

ELSE 

ZF^O; 

temp OperandSize - 1; 

WHILE Bit(SRC, temp) = 0 
DO 

temp <- temp -1; 

OD; 

DEST <- temp; 

FI; 

Flags Affected 

The ZF flag is set to 1 if all the source operand is 0; otherwise, the ZF flag is cleared. The CF, OF, SF, AF, and PF, flags 
are undefined. 


Protected Mode Exceptions 


#GP(0) 

#SS(0) 

#PF(fault-code) 

#AC(0) 

#UD 


If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 
If the DS, ES, FS, or GS register contains a NULL segment selector. 

If a memory operand effective address is outside the SS segment limit. 

If a page fault occurs. 

If alignment checking is enabled and an unaligned memory reference is made while the 
current privilege level is 3. 

If the LOCK prefix is used. 
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Real-Address Mode Exceptions 

#GP If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

#SS If a memory operand effective address is outside the SS segment limit. 

#UD If the LOCK prefix is used. 

Virtual-SOSe Mode Exceptions 

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

#SS(0) If a memory operand effective address is outside the SS segment limit. 

#PF(fault-code) If a page fault occurs. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made. 

#UD If the LOCK prefix is used. 

Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

64-Bit Mode Exceptions 

#SS(0) If a memory address referencing the SS segment is in a non-canonical form. 

#GP(0) If the memory address is in a non-canonical form. 

#PF(fault-code) If a page fault occurs. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the 

current privilege level is 3. 

#UD If the LOCK prefix is used. 
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BSWAP—Byte Swap 


Opcode 

Instruction 

Op/ 

En 

64-bit 

Mode 

Compat/ 
Leg Mode 

Description 

OF C8+rd 

BSWAP r32 

0 

Valid* 

Valid 

Reverses the byte order of a 32-bit register. 

REX.W + OF C8+rd 

BSWAP r64 

0 

Valid 

N.E. 

Reverses the byte order of a 64-bit register. 


NOTES: 

* See IA-32 Architecture Compatibility section below. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

0 

opcode + rd (r, w) 

NA 

NA 

NA 


Description 

Reverses the byte order of a 32-bit or 64-bit (destination) register. This instruction is provided for converting little- 
endian values to big-endian format and vice versa. To swap bytes in a word value (16-bit register), use the XCHG 
instruction. When the BSWAP instruction references a 16-bit register, the result is undefined. 

In 64-bit mode, the instruction's default operation size is 32 bits. Using a REX prefix in the form of REX.R permits 
access to additional registers (R8-R15). Using a REX prefix in the form of REX.W promotes operation to 64 bits. See 
the summary chart at the beginning of this section for encoding data and limits. 

IA-32 Architecture Legacy Compatibility 

The BSWAP instruction is not supported on IA-32 processors earlier than the Intel486™ processor family. For 
compatibility with this instruction, software should include functionally equivalent code for execution on Intel 
processors earlier than the Intel486 processor family. 

Operation 

TEMP ^ BEST 

IF 64-blt mode AND OperandSIze = 64 
THEN 

DEST[7:0] ^ TEMP[63:56]; 

DEST[15:8] ^ TEMP[55:48]; 

DEST[23:16] ^ TEMP[47:40]; 

DEST[31:24]^TEMP[39:32]; 

DEST[39:32]^TEMP[31:24]; 

DEST[47:40] ^ TEMP[23:16]; 

DEST[55:48]^TEMP[15:8]; 

DEST[63:56] ^ TEMP[7:0]; 

ELSE 

DEST[7:0]^TEMP[31:24]; 

DEST[15:8]^TEMP[23:16]; 

DEST[23:16]^TEMP[15:8]; 

DEST[31:24]^TEMP[7:0]; 

FI; 

Flags Affected 

None. 

Exceptions (All Operating Modes) 

#UD If the LOCK prefix is used. 
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BT—Bit Test 


Opcode 

Instruction 

Op/ 

En 

64-bit 

Mode 

Compat/ 
Leg Mode 

Description 

OF A3 /r 

BJ r/ml6, r16 

MR 

Valid 

Valid 

Store selected bit in CF flag. 

OF A3 /r 

BT r/m32, r32 

MR 

Valid 

Valid 

Store selected bit in CF flag. 

REX.W + OF A3 /r 

BT r/m64, r64 

MR 

Valid 

N.E. 

Store selected bit in CF flag. 

OF BA /4 ib 

BT r/m 16, immS 

Ml 

Valid 

Valid 

Store selected bit in CF flag. 

OF BA /4 ib 

BT r/m32, immS 

Ml 

Valid 

Valid 

Store selected bit in CF flag. 

REX.W + OF BA /4 ib 

BT r/m64, immS 

Ml 

Valid 

N.E. 

Store selected bit in CF flag. 


Instruction Operand 

Encoding 

Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

MR 

ModRM:r/m (r) 

ModRM:reg (r) 

NA 

NA 

Ml 

ModRM:r/m (r) 

immB 

NA 

NA 


Description 

Selects the bit in a bit string (specified with the first operand, called the bit base) at the bit-position designated by 
the bit offset (specified by the second operand) and stores the value of the bit in the CF flag. The bit base operand 
can be a register or a memory location; the bit offset operand can be a register or an immediate value: 

• If the bit base operand specifies a register, the instruction takes the modulo 16, 32, or 64 of the bit offset 
operand (modulo size depends on the mode and register size; 64-bit operands are available only in 64-bit 
mode). 

• If the bit base operand specifies a memory location, the operand represents the address of the byte in memory 
that contains the bit base (bit 0 of the specified byte) of the bit string. The range of the bit position that can be 
referenced by the offset operand depends on the operand size. 

See also: Bit(BitBase, BitOffset) on page 3-11. 

Some assemblers support immediate bit offsets larger than 31 by using the immediate bit offset field in combina¬ 
tion with the displacement field of the memory operand. In this case, the low-order 3 or 5 bits (3 for 16-bit oper¬ 
ands, 5 for 32-bit operands) of the immediate bit offset are stored in the immediate bit offset field, and the high- 
order bits are shifted and combined with the byte displacement in the addressing mode by the assembler. The 
processor will ignore the high order bits if they are not zero. 

When accessing a bit in memory, the processor may access 4 bytes starting from the memory address for a 32-bit 
operand size, using by the following relationship: 

Effective Address + (4 * (BitOffset DIV 32)) 

Or, it may access 2 bytes starting from the memory address for a 16-bit operand, using this relationship: 

Effective Address + (2 * (BitOffset DIV 16)) 

It may do so even when only a single byte needs to be accessed to reach the given bit. When using this bit 
addressing mechanism, software should avoid referencing areas of memory close to address space holes. In partic¬ 
ular, it should avoid references to memory-mapped I/O registers. Instead, software should use the MOV instruc¬ 
tions to load from or store to these addresses, and use the register form of these instructions to manipulate the 
data. 

In 64-bit mode, the instruction's default operation size is 32 bits. Using a REX prefix in the form of REX.R permits 
access to additional registers (R8-R15). Using a REX prefix in the form of REX.W promotes operation to 64 bit oper¬ 
ands. See the summary chart at the beginning of this section for encoding data and limits. 

Operation 

CF <- Bit(BitBase, BitOffset); 
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Flags Affected 

The CF flag contains the value of the selected bit. The ZF flag is unaffected. The OF, SF, AF, and PF flags are 
undefined. 

Protected Mode Exceptions 

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

If the DS, ES, FS, or GS register contains a NULL segment selector. 

#SS(0) If a memory operand effective address is outside the SS segment limit. 

#PF(fault-code) If a page fault occurs. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the 

current privilege level is 3. 

#UD If the LOCK prefix is used. 

Real-Address Mode Exceptions 

#GP If a memory operand effective 

#SS If a memory operand effective 

#UD If the LOCK prefix is used. 

Virtual-SOSe Mode Exceptions 

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

#SS(0) If a memory operand effective address is outside the SS segment limit. 

#PF(fault-code) If a page fault occurs. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made. 

#UD If the LOCK prefix is used. 

Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

e4-Bit Mode Exceptions 

#SS(0) If a memory address referencing the SS segment is in a non-canonical form. 

#GP(0) If the memory address is in a non-canonical form. 

#PF(fault-code) If a page fault occurs. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the 

current privilege level is 3. 

#UD If the LOCK prefix is used. 


address is outside the CS, DS, ES, FS, or GS segment limit, 
address is outside the SS segment limit. 
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BTC—Bit Test and Complement 


Opcode 

Instruction 

Op/ 

En 

64-bit 

Mode 

Compat/ 
Leg Mode 

Description 

OF BB /r 

BTC r/m 76, rl6 

MR 

Valid 

Valid 

Store selected bit in CF flag and complement. 

OF BB /r 

BTC r/m32, r32 

MR 

Valid 

Valid 

Store selected bit in CF flag and complement. 

REX.W + OF BB /r 

BTC r/m64, r64 

MR 

Valid 

N.E. 

Store selected bit in CF flag and complement. 

OF BA n ib 

BTC r/m 16, imm8 

Ml 

Valid 

Valid 

Store selected bit in CF flag and complement. 

OF BA /7 ib 

BTC r/m32, imm8 

Ml 

Valid 

Valid 

Store selected bit in CF flag and complement. 

REX.W + OF BA /7 ib 

BTC r/m64, imm8 

Ml 

Valid 

N.E. 

Store selected bit in CF flag and complement. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

MR 

ModRM:r/m (r, w) 

ModRM:reg (r) 

NA 

NA 

Ml 

ModRM:r/m (r, w) 

immB 

NA 

NA 


Description 

Selects the bit in a bit string (specified with the first operand, called the bit base) at the bit-position designated by 
the bit offset operand (second operand), stores the value of the bit in the CF flag, and complements the selected 
bit in the bit string. The bit base operand can be a register or a memory location; the bit offset operand can be a 
register or an immediate value: 

• If the bit base operand specifies a register, the instruction takes the modulo 16, 32, or 64 of the bit offset 
operand (modulo size depends on the mode and register size; 64-bit operands are available only in 64-bit 
mode). This allows any bit position to be selected. 

• If the bit base operand specifies a memory location, the operand represents the address of the byte in memory 
that contains the bit base (bit 0 of the specified byte) of the bit string. The range of the bit position that can be 
referenced by the offset operand depends on the operand size. 

See also: Bit(BitBase, BitOffset) on page 3-11. 

Some assemblers support immediate bit offsets larger than 31 by using the immediate bit offset field in combina¬ 
tion with the displacement field of the memory operand. See "BT—Bit Test" in this chapter for more information on 
this addressing mechanism. 

This instruction can be used with a LOCK prefix to allow the instruction to be executed atomically. 

In 64-bit mode, the instruction's default operation size is 32 bits. Using a REX prefix in the form of REX.R permits 
access to additional registers (R8-R15). Using a REX prefix in the form of REX.W promotes operation to 64 bits. See 
the summary chart at the beginning of this section for encoding data and limits. 

Operation 

CF <- Bit(BitBase, BitOffset); 

Bit(BitBase, BitOffset) NOT Bit(BitBase, BitOffset); 

Flags Affected 

The CF flag contains the value of the selected bit before it is complemented. The ZF flag is unaffected. The OF, SF, 
AF, and PF flags are undefined. 


BTC—Bit Test and Complement 


Vol.2A 3-115 





















INSTRUCTION SET REFERENCE, A-L 


Protected Mode Exceptions 


#GP(0) 


#SS(0) 

#PF(fault-code) 

#AC(0) 

#UD 


If the destination operand points to a non-writable segment. 

If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 
If the DS, ES, FS, or GS register contains a NULL segment selector. 

If a memory operand effective address is outside the SS segment limit. 

If a page fault occurs. 

If alignment checking is enabled and an unaligned memory reference is made while the 
current privilege level is 3. 

If the LOCK prefix is used but the destination is not a memory operand. 


Real-Address Mode 

#GP 

#SS 

#UD 


Exceptions 

If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 
If a memory operand effective address is outside the SS segment limit. 

If the LOCK prefix is used but the destination is not a memory operand. 


Virtual-SOSe Mode 

#GP(0) 

#SS(0) 

#PF(fault-code) 

#AC(0) 

#UD 


Exceptions 

If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 
If a memory operand effective address is outside the SS segment limit. 

If a page fault occurs. 

If alignment checking is enabled and an unaligned memory reference is made. 

If the LOCK prefix is used but the destination is not a memory operand. 


Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

e4-Bit Mode Exceptions 

#SS(0) If a memory address referencing the SS segment is in a non-canonical form. 

#GP(0) If the memory address is in a non-canonical form. 

#PF(fault-code) If a page fault occurs. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the 

current privilege level is 3. 

#UD If the LOCK prefix is used but the destination is not a memory operand. 
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BTR—Bit Test and Reset 


Opcode 

Instruction 

Op/ 

En 

64-bit 

Mode 

Compat/ 
Leg Mode 

Description 

OF B3 /r 

BTR r/m 76, rl6 

MR 

Valid 

Valid 

Store selected bit in CF flag and clear. 

OF B3 /r 

BTR r/m32, r32 

MR 

Valid 

Valid 

Store selected bit in CF flag and clear. 

REX.W + OF B3 /r 

BTR r/m64, r64 

MR 

Valid 

N.E. 

Store selected bit in CF flag and clear. 

OF BA /6 ib 

BTR r/m 7 6, immS 

Ml 

Valid 

Valid 

Store selected bit in CF flag and clear. 

OF BA /6 ib 

BTR r/m32, imm8 

Ml 

Valid 

Valid 

Store selected bit in CF flag and clear. 

REX.W + OF BA /6 ib 

BTR r/m64, imm8 

Ml 

Valid 

N.E. 

Store selected bit in CF flag and clear. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

MR 

ModRM:r/m (r, w) 

ModRM:reg (r) 

NA 

NA 

Ml 

ModRM:r/m (r, w) 

ImmB 

NA 

NA 


Description 

Selects the bit in a bit string (specified with the first operand, called the bit base) at the bit-position designated by 
the bit offset operand (second operand), stores the value of the bit in the CF flag, and clears the selected bit in the 
bit string to 0. The bit base operand can be a register or a memory location; the bit offset operand can be a register 
or an immediate value: 

• If the bit base operand specifies a register, the instruction takes the modulo 16, 32, or 64 of the bit offset 
operand (modulo size depends on the mode and register size; 64-bit operands are available only in 64-bit 
mode). This allows any bit position to be selected. 

• If the bit base operand specifies a memory location, the operand represents the address of the byte in memory 
that contains the bit base (bit 0 of the specified byte) of the bit string. The range of the bit position that can be 
referenced by the offset operand depends on the operand size. 

See also: Bit(BitBase, BitOffset) on page 3-11. 

Some assemblers support immediate bit offsets larger than 31 by using the immediate bit offset field in combina¬ 
tion with the displacement field of the memory operand. See "BT—Bit Test" in this chapter for more information on 
this addressing mechanism. 

This instruction can be used with a LOCK prefix to allow the instruction to be executed atomically. 

In 64-bit mode, the instruction's default operation size is 32 bits. Using a REX prefix in the form of REX.R permits 
access to additional registers (R8-R15). Using a REX prefix in the form of REX.W promotes operation to 64 bits. See 
the summary chart at the beginning of this section for encoding data and limits. 

Operation 

CF <- Bit(BitBase, BitOffset); 

Bit(BitBase, BitOffset) 0; 

Flags Affected 

The CF flag contains the value of the selected bit before it is cleared. The ZF flag is unaffected. The OF, SF, AF, and 
PF flags are undefined. 
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Protected Mode Exceptions 


#GP(0) 


#SS(0) 

#PF(fault-code) 

#AC(0) 

#UD 


If the destination operand points to a non-writable segment. 

If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 
If the DS, ES, FS, or GS register contains a NULL segment selector. 

If a memory operand effective address is outside the SS segment limit. 

If a page fault occurs. 

If alignment checking is enabled and an unaligned memory reference is made while the 
current privilege level is 3. 

If the LOCK prefix is used but the destination is not a memory operand. 


Real-Address Mode 

#GP 

#SS 

#UD 


Exceptions 

If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 
If a memory operand effective address is outside the SS segment limit. 

If the LOCK prefix is used but the destination is not a memory operand. 


Virtual-SOSe Mode 

#GP(0) 

#SS(0) 

#PF(fault-code) 

#AC(0) 

#UD 


Exceptions 

If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 
If a memory operand effective address is outside the SS segment limit. 

If a page fault occurs. 

If alignment checking is enabled and an unaligned memory reference is made. 

If the LOCK prefix is used but the destination is not a memory operand. 


Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

e4-Bit Mode Exceptions 

#SS(0) If a memory address referencing the SS segment is in a non-canonical form. 

#GP(0) If the memory address is in a non-canonical form. 

#PF(fault-code) If a page fault occurs. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the 

current privilege level is 3. 

#UD If the LOCK prefix is used but the destination is not a memory operand. 
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BTS—Bit Test and Set 


Opcode 

Instruction 

Op/ 

En 

64-bit 

Mode 

Compat/ 
Leg Mode 

Description 

OF AB /r 

BTS r/m 76, rl6 

MR 

Valid 

Valid 

Store selected bit in CF flag and set. 

OF AB /r 

BTS r/m32, r32 

MR 

Valid 

Valid 

Store selected bit in CF flag and set. 

REX.W + OF AB /r 

BTS r/m64, r64 

MR 

Valid 

N.E. 

Store selected bit in CF flag and set. 

OF BA /5 ib 

BTS r/m 7 6, immS 

Ml 

Valid 

Valid 

Store selected bit in CF flag and set. 

OF BA /5 ib 

BTS r/m32, immS 

Ml 

Valid 

Valid 

Store selected bit in CF flag and set. 

REX.W + OF BA /5 ib 

BTS r/m64, immS 

Ml 

Valid 

N.E. 

Store selected bit in CF flag and set. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

MR 

ModRM:r/m (r, w) 

ModRM:reg (r) 

NA 

NA 

Ml 

ModRM:r/m (r, w) 

immB 

NA 

NA 


Description 

Selects the bit in a bit string (specified with the first operand, called the bit base) at the bit-position designated by 
the bit offset operand (second operand), stores the value of the bit in the CF flag, and sets the selected bit in the 
bit string to 1. The bit base operand can be a register or a memory location; the bit offset operand can be a register 
or an immediate value: 

• If the bit base operand specifies a register, the instruction takes the modulo 16, 32, or 64 of the bit offset 
operand (modulo size depends on the mode and register size; 64-bit operands are available only in 64-bit 
mode). This allows any bit position to be selected. 

• If the bit base operand specifies a memory location, the operand represents the address of the byte in memory 
that contains the bit base (bit 0 of the specified byte) of the bit string. The range of the bit position that can be 
referenced by the offset operand depends on the operand size. 

See also: Bit(BitBase, BitOffset) on page 3-11. 

Some assemblers support immediate bit offsets larger than 31 by using the immediate bit offset field in combina¬ 
tion with the displacement field of the memory operand. See "BT—Bit Test" in this chapter for more information on 
this addressing mechanism. 

This instruction can be used with a LOCK prefix to allow the instruction to be executed atomically. 

In 64-bit mode, the instruction's default operation size is 32 bits. Using a REX prefix in the form of REX.R permits 
access to additional registers (R8-R15). Using a REX prefix in the form of REX.W promotes operation to 64 bits. See 
the summary chart at the beginning of this section for encoding data and limits. 

Operation 

CF <- Bit(BitBase, BitOffset); 

Bit(BitBase, BitOffset) 1; 

Flags Affected 

The CF flag contains the value of the selected bit before it is set. The ZF flag is unaffected. The OF, SF, AF, and PF 
flags are undefined. 
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Protected Mode Exceptions 


#GP(0) 


#SS(0) 

#PF(fault-code) 

#AC(0) 

#UD 


If the destination operand points to a non-writable segment. 

If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 
If the DS, ES, FS, or GS register contains a NULL segment selector. 

If a memory operand effective address is outside the SS segment limit. 

If a page fault occurs. 

If alignment checking is enabled and an unaligned memory reference is made while the 
current privilege level is 3. 

If the LOCK prefix is used but the destination is not a memory operand. 


Real-Address Mode 

#GP 

#SS 

#UD 


Exceptions 

If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 
If a memory operand effective address is outside the SS segment limit. 

If the LOCK prefix is used but the destination is not a memory operand. 


Virtual-SOSe Mode 

#GP 

#SS 


#PF(fault-code) 

#AC(0) 

#UD 


Exceptions 

If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 
If a memory operand effective address is outside the SS segment limit. 

If a page fault occurs. 

If alignment checking is enabled and an unaligned memory reference is made. 

If the LOCK prefix is used but the destination is not a memory operand. 


Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

e4-Bit Mode Exceptions 

#SS(0) If a memory address referencing the SS segment is in a non-canonical form. 

#GP(0) If the memory address is in a non-canonical form. 

#PF(fault-code) If a page fault occurs. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the 

current privilege level is 3. 

#UD If the LOCK prefix is used but the destination is not a memory operand. 
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BZHI — Zero High Bits Starting with Specified Bit Position 


Opcode/Instruction 

Op/ 

En 

64/32 

-bit 

Mode 

CPUID 

Feature 

Flag 

Description 

VEX.NDS.LZ.0F38.W0 F5 /r 
BZHI r32a, r/m32, r32b 

RMV 

V/V 

BMI2 

Zero bits in r/m32 starting with the position in r32b, write result to 
r32a. 

VEX.NDS.LZ.0F38.W1 F5 /r 
BZHI r64a, r/m64, r64b 

RMV 

V/N.E. 

BMI2 

Zero bits in r/m64 starting with the position in r64b, write result to 
r64a. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RMV 

ModRM:reg (w) 

ModRM:r/m (r) 

VEX.vvvv (r) 

NA 


Description 

BZHI copies the bits of the first source operand (the second operand) into the destination operand (the first 
operand) and clears the higher bits in the destination according to the INDEX value specified by the second source 
operand (the third operand). The INDEX is specified by bits 7:0 of the second source operand. The INDEX value is 
saturated at the value of OperandSize -1. CF is set, if the number contained in the 8 low bits of the third operand 
is greater than OperandSize -1. 

This instruction is not supported in real mode and virtual-8086 mode. The operand size is always 32 bits if not in 
64-bit mode. In 64-bit mode operand size 64 requires VEX.Wl. VEX.Wl is ignored in non-64-bit modes. An 
attempt to execute this instruction with VEX.L not equal to 0 will cause #UD. 

Operation 

N ^ SRC2[7:0] 

DEST ^ SRC1 
IF (N < OperandSize) 

DEST[0perandSize-1:N]^0 

FI 

IF (N > OperandSize -1) 

CF^ 1 
ELSE 
CF^O 
FI 

Flags Affected 

ZF, CF and SF flags are updated based on the result. OF flag is cleared. AF and PF flags are undefined. 

Intel C/C++ Compiler Intrinsic Equivalent 

BZHI: unsigned int32 _bzhi_u32(unsigned int32 src, unsigned int32 index); 

BZHI: unsigned int64 _bzhi_u64(unsigned int64 src, unsigned int32 index); 

SIMD Floating-Point Exceptions 

None 

Other Exceptions 

See Section 2.5.1, "Exception Conditions for VEX-Encoded GPR Instructions", Table 2-29; additionally 
#UD IfVEX.W=l. 
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CALL—Call Procedure 


Opcode 

Instruction 

Op/ 

En 

64-bit 

Mode 

Compat/ 
Leg Mode 

Description 

E8 cw 

CALL re/7 6 

M 

N.S. 

Valid 

Call near, relative, displacement relative to next 
instruction. 

E8 cd 

CALL re/32 

M 

Valid 

Valid 

Call near, relative, displacement relative to next 
instruction. 32-bit displacement sign extended to 
64-bits in 64-bit mode. 

FF/2 

CALLr/m76 

M 

N.E. 

Valid 

Call near, absolute indirect, address given in r/ml6. 

FF/2 

CALL r/m32 

M 

N.E. 

Valid 

Call near, absolute indirect, address given in r/m32. 

FF/2 

CALL r/m64 

M 

Valid 

N.E. 

Call near, absolute indirect, address given in r/m64. 

9A cd 

CALL ptrl6:16 

D 

Invalid 

Valid 

Call far, absolute, address given in operand. 

9A cp 

CALL ptrl6:32 

D 

Invalid 

Valid 

Call far, absolute, address given in operand. 

FF/3 

CALL m 7 6; 7 6 

M 

Valid 

Valid 

Call far, absolute indirect address given in ml6:16. 

In 32-bit mode: if selector points to a gate, then RIP 
= 32-bit zero extended displacement taken from 
gate; else RIP = zero extended 16-bit offset from 
far pointer referenced in the instruction. 

FF/3 

CALL m 7 6;32 

M 

Valid 

Valid 

In 64-bit mode: If selector points to a gate, then RIP 
= 64-bit displacement taken from gate; else RIP = 
zero extended 32-bit offset from far pointer 
referenced in the instruction. 

REX.W + FF /3 

CALL m 7 6;64 

M 

Valid 

N.E. 

In 64-bit mode: If selector points to a gate, then RIP 
= 64-bit displacement taken from gate; else RIP = 
64-bit offset from far pointer referenced in the 
instruction. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

D 

Offset 

NA 

NA 

NA 

M 

ModRM:r/m (r) 

NA 

NA 

NA 


Description 

Saves procedure linking information on the stack and branches to the called procedure specified using the target 
operand. The target operand specifies the address of the first instruction in the called procedure. The operand can 
be an immediate value, a general-purpose register, or a memory location. 

This instruction can be used to execute four types of calls: 

• Near Call — A call to a procedure in the current code segment (the segment currently pointed to by the CS 
register), sometimes referred to as an intra-segment call. 

• Far Call — A call to a procedure located in a different segment than the current code segment, sometimes 
referred to as an inter-segment call. 

• I nter-privilege-level far call — A far call to a procedure in a segment at a different privilege level than that 
of the currently executing program or procedure. 

• Task switch — A call to a procedure located in a different task. 

The latter two call types (inter-privilege-level call and task switch) can only be executed in protected mode. See 
"Calling Procedures Using Call and RET" in Chapter 6 of the Intel® 64 and IA-32 Architectures Software Devel¬ 
oper's Manual, Volume 1, for additional information on near, far, and inter-privilege-level calls. See Chapter 7, 
"Task Management," in the Intel® 64 and IA-32 Architectures Software Developer's Manual, Volume 3A, for infor¬ 
mation on performing task switches with the CALL instruction. 
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Near Call. When executing a near call, the processor pushes the value of the EIP register (which contains the offset 
of the instruction following the CALL instruction) on the stack (for use later as a return-instruction pointer). The 
processor then branches to the address in the current code segment specified by the target operand. The target 
operand specifies either an absolute offset in the code segment (an offset from the base of the code segment) or a 
relative offset (a signed displacement relative to the current value of the instruction pointer in the EIP register; this 
value points to the instruction following the CALL instruction). The CS register is not changed on near calls. 

For a near call absolute, an absolute offset is specified indirectly in a general-purpose register or a memory location 
(r/ml6, r/m32, or r/m64). The operand-size attribute determines the size of the target operand (16, 32 or 64 
bits). When in 64-bit mode, the operand size for near call (and all near branches) is forced to 64-bits. Absolute 
offsets are loaded directly into the EIP(RIP) register. If the operand size attribute is 16, the upper two bytes of the 
EIP register are cleared, resulting in a maximum instruction pointer size of 16 bits. When accessing an absolute 
offset indirectly using the stack pointer [ESP] as the base register, the base value used is the value of the ESP 
before the instruction executes. 

A relative offset (rell6 or rel32) is generally specified as a label in assembly code. But at the machine code level, it 
is encoded as a signed, 16- or 32-bit immediate value. This value is added to the value in the EIP(RIP) register. In 
64-bit mode the relative offset is always a 32-bit immediate value which is sign extended to 64-bits before it is 
added to the value in the RIP register for the target calculation. As with absolute offsets, the operand-size attribute 
determines the size of the target operand (16, 32, or 64 bits). In 64-bit mode the target operand will always be 64- 
bits because the operand size is forced to 64-bits for near branches. 

Far Calls in Real-Address or Virtual-8086 Mode. When executing a far call in real- address or virtual-8086 mode, the 
processor pushes the current value of both the CS and EIP registers on the stack for use as a return-instruction 
pointer. The processor then performs a "far branch" to the code segment and offset specified with the target 
operand for the called procedure. The target operand specifies an absolute far address either directly with a pointer 
(ptrl6:16 or ptrl6:32) or indirectly with a memory location (ml6:16 or ml6:32). With the pointer method, the 
segment and offset of the called procedure is encoded in the instruction using a 4-byte (16-bit operand size) or 6- 
byte (32-bit operand size) far address immediate. With the indirect method, the target operand specifies a memory 
location that contains a 4-byte (16-bit operand size) or 6-byte (32-bit operand size) far address. The operand-size 
attribute determines the size of the offset (16 or 32 bits) in the far address. The far address is loaded directly into 
the CS and EIP registers. If the operand-size attribute is 16, the upper two bytes of the EIP register are cleared. 

Far Calls in Protected Mode. When the processor is operating in protected mode, the CALL instruction can be used to 
perform the following types of far calls: 

• Far call to the same privilege level 

• Far call to a different privilege level (inter-privilege level call) 

• Task switch (far call to another task) 

In protected mode, the processor always uses the segment selector part of the far address to access the corre¬ 
sponding descriptor in the GDT or LDT. The descriptor type (code segment, call gate, task gate, or TSS) and access 
rights determine the type of call operation to be performed. 

If the selected descriptor is for a code segment, a far call to a code segment at the same privilege level is 
performed. (If the selected code segment is at a different privilege level and the code segment is non-conforming, 
a general-protection exception is generated.) A far call to the same privilege level in protected mode is very similar 
to one carried out in real-address or virtual-8086 mode. The target operand specifies an absolute far address either 
directly with a pointer (ptrl6:16 or ptrl6:32) or indirectly with a memory location (ml6:16 or ml6:32). The 
operand- size attribute determines the size of the offset (16 or 32 bits) in the far address. The new code segment 
selector and its descriptor are loaded into CS register; the offset from the instruction is loaded into the EIP register. 

A call gate (described in the next paragraph) can also be used to perform a far call to a code segment at the same 
privilege level. Using this mechanism provides an extra level of indirection and is the preferred method of making 
calls between 16-bit and 32-bit code segments. 

When executing an inter-privilege-level far call, the code segment for the procedure being called must be accessed 
through a call gate. The segment selector specified by the target operand identifies the call gate. The target 
operand can specify the call gate segment selector either directly with a pointer (ptrl6:16 or ptrl6:32) or indirectly 
with a memory location (ml6:16 or ml6:32). The processor obtains the segment selector for the new code 
segment and the new instruction pointer (offset) from the call gate descriptor. (The offset from the target operand 
is ignored when a call gate is used.) 
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On inter-privilege-level calls, the processor switches to the stack for the privilege level of the called procedure. The 
segment selector for the new stack segment is specified in the TSS for the currently running task. The branch to 
the new code segment occurs after the stack switch. (Note that when using a call gate to perform a far call to a 
segment at the same privilege level, no stack switch occurs.) On the new stack, the processor pushes the segment 
selector and stack pointer for the calling procedure's stack, an optional set of parameters from the calling proce¬ 
dures stack, and the segment selector and instruction pointer for the calling procedure's code segment. (A value in 
the call gate descriptor determines how many parameters to copy to the new stack.) Finally, the processor 
branches to the address of the procedure being called within the new code segment. 

Executing a task switch with the CALL instruction is similar to executing a call through a call gate. The target 
operand specifies the segment selector of the task gate for the new task activated by the switch (the offset in the 
target operand is ignored). The task gate in turn points to the TSS for the new task, which contains the segment 
selectors for the task's code and stack segments. Note that the TSS also contains the EIP value for the next instruc¬ 
tion that was to be executed before the calling task was suspended. This instruction pointer value is loaded into the 
EIP register to re-start the calling task. 

The CALL instruction can also specify the segment selector of the TSS directly, which eliminates the indirection of 
the task gate. See Chapter 7, "Task Management," in the I ntel® 64 and IA-32 Architectures Software Developer's 
Manual, Volume 3A, for information on the mechanics of a task switch. 

When you execute at task switch with a CALL instruction, the nested task flag (NT) is set in the EFLAGS register and 
the new TSS's previous task link field is loaded with the old task's TSS selector. Code is expected to suspend this 
nested task by executing an IRET instruction which, because the NT flag is set, automatically uses the previous 
task link to return to the calling task. (See "Task Linking" in Chapter 7 of the I ntel® 64 and I A-32 Architectures 
Software Developer's Manual, Volume 3A, for information on nested tasks.) Switching tasks with the CALL instruc¬ 
tion differs in this regard from IMP instruction. IMP does not set the NT flag and therefore does not expect an IRET 
instruction to suspend the task. 

Mixing 16-Bit and 32-Bit Calls. When making far calls between 16-bit and 32-bit code segments, use a call gate. If 
the far call is from a 32-bit code segment to a 16-bit code segment, the call should be made from the first 64 

KBytes of the 32-bit code segment. This is because the operand-size attribute of the instruction is set to 16, so only 

a 16-bit return address offset can be saved. Also, the call should be made using a 16-bit call gate so that 16-bit 
values can be pushed on the stack. See Chapter 21, "Mixing 16-Bit and 32-Bit Code," in the Intel® 64 and IA-32 
Architectures Software Developer's Manual, Volume 3B, for more information. 

Far Calls in Compatibility Mode. When the processor is operating in compatibility mode, the CALL instruction can be 
used to perform the following types of far calls: 

• Far call to the same privilege level, remaining in compatibility mode 

• Far call to the same privilege level, transitioning to 64-bit mode 

• Far call to a different privilege level (inter-privilege level call), transitioning to 64-bit mode 

Note that a CALL instruction can not be used to cause a task switch in compatibility mode since task switches are 
not supported in IA-32e mode. 

In compatibility mode, the processor always uses the segment selector part of the far address to access the corre¬ 
sponding descriptor in the GDT or LDT. The descriptor type (code segment, call gate) and access rights determine 
the type of call operation to be performed. 

If the selected descriptor is for a code segment, a far call to a code segment at the same privilege level is 
performed. (If the selected code segment is at a different privilege level and the code segment is non-conforming, 
a general-protection exception is generated.) A far call to the same privilege level in compatibility mode is very 
similar to one carried out in protected mode. The target operand specifies an absolute far address either directly 
with a pointer (ptrl6:16 or ptrl6:32) or indirectly with a memory location (ml6:16 or ml6:32). The operand-size 
attribute determines the size of the offset (16 or 32 bits) in the far address. The new code segment selector and its 
descriptor are loaded into CS register and the offset from the instruction is loaded into the EIP register. The differ¬ 
ence is that 64-bit mode may be entered. This specified by the L bit in the new code segment descriptor. 

Note that a 64-bit call gate (described in the next paragraph) can also be used to perform a far call to a code 
segment at the same privilege level. Flowever, using this mechanism requires that the target code segment 
descriptor have the L bit set, causing an entry to 64-bit mode. 

When executing an inter-privilege-level far call, the code segment for the procedure being called must be accessed 
through a 64-bit call gate. The segment selector specified by the target operand identifies the call gate. The target 
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operand can specify the call gate segment selector either directly with a pointer (ptrl6:16 or ptrl6:32) or indirectly 
with a memory location (ml6:16 or ml6:32). The processor obtains the segment selector for the new code 
segment and the new instruction pointer (offset) from the 16-byte call gate descriptor. (The offset from the target 
operand is ignored when a call gate is used.) 

On inter-privilege-level calls, the processor switches to the stack for the privilege level of the called procedure. The 
segment selector for the new stack segment is set to NULL. The new stack pointer is specified in the TSS for the 
currently running task. The branch to the new code segment occurs after the stack switch. (Note that when using 
a call gate to perform a far call to a segment at the same privilege level, an implicit stack switch occurs as a result 
of entering 64-bit mode. The SS selector is unchanged, but stack segment accesses use a segment base of 0x0, 
the limit is ignored, and the default stack size is 64-bits. The full value of RSP is used for the offset, of which the 
upper 32-bits are undefined.) On the new stack, the processor pushes the segment selector and stack pointer for 
the calling procedure's stack and the segment selector and instruction pointer for the calling procedure's code 
segment. (Parameter copy is not supported in IA-32e mode.) Finally, the processor branches to the address of the 
procedure being called within the new code segment. 

Near/(Far) Calls in 64-bit Mode. When the processor is operating in 64-bit mode, the CALL instruction can be used to 
perform the following types of far calls: 

• Far call to the same privilege level, transitioning to compatibility mode 

• Far call to the same privilege level, remaining in 64-bit mode 

• Far call to a different privilege level (inter-privilege level call), remaining in 64-bit mode 

Note that in this mode the CALL instruction can not be used to cause a task switch in 64-bit mode since task 
switches are not supported in IA-32e mode. 

In 64-bit mode, the processor always uses the segment selector part of the far address to access the corresponding 
descriptor in the GOT or LDT. The descriptor type (code segment, call gate) and access rights determine the type 
of call operation to be performed. 

If the selected descriptor is for a code segment, a far call to a code segment at the same privilege level is 
performed. (If the selected code segment is at a different privilege level and the code segment is non-conforming, 
a general-protection exception is generated.) A far call to the same privilege level in 64-bit mode is very similar to 
one carried out in compatibility mode. The target operand specifies an absolute far address indirectly with a 
memory location (ml6:16, ml6:32 or ml6:64). The form of CALL with a direct specification of absolute far 
address is not defined in 64-bit mode. The operand-size attribute determines the size of the offset (16, 32, or 64 
bits) in the far address. The new code segment selector and its descriptor are loaded into the CS register; the offset 
from the instruction is loaded into the EIP register. The new code segment may specify entry either into compati¬ 
bility or 64-bit mode, based on the L bit value. 

A 64-bit call gate (described in the next paragraph) can also be used to perform a far call to a code segment at the 
same privilege level. Flowever, using this mechanism requires that the target code segment descriptor have the L 
bit set. 

When executing an inter-privilege-level far call, the code segment for the procedure being called must be accessed 
through a 64-bit call gate. The segment selector specified by the target operand identifies the call gate. The target 
operand can only specify the call gate segment selector indirectly with a memory location (ml6:16, ml6:32 or 
ml6:64). The processor obtains the segment selector for the new code segment and the new instruction pointer 
(offset) from the 16-byte call gate descriptor. (The offset from the target operand is ignored when a call gate is 
used.) 

On inter-privilege-level calls, the processor switches to the stack for the privilege level of the called procedure. The 
segment selector for the new stack segment is set to NULL. The new stack pointer is specified in the TSS for the 
currently running task. The branch to the new code segment occurs after the stack switch. 

Note that when using a call gate to perform a far call to a segment at the same privilege level, an implicit stack 
switch occurs as a result of entering 64-bit mode. The SS selector is unchanged, but stack segment accesses use 
a segment base of 0x0, the limit is ignored, and the default stack size is 64-bits. (The full value of RSP is used for 
the offset.) On the new stack, the processor pushes the segment selector and stack pointer for the calling proce¬ 
dure's stack and the segment selector and instruction pointer for the calling procedure's code segment. (Parameter 
copy is not supported in IA-32e mode.) Finally, the processor branches to the address of the procedure being called 
within the new code segment. 
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Operation 

IF near call 

TFIEN IF near relative call 
THEN 

IF OperandSIze = 64 
THEN 

tempDEST ^ SlgnExtend(DEST); (* DEST is rel32 *) 
tempRIP ^ RIP + tempDEST; 

IF stack not large enough for a 8-byte return address 
THEN #SS(0); FI; 

Push(RIP); 

RIP tempRIP; 

FI; 

IF OperandSIze = 32 
THEN 

tempElP ^ EIP -r DEST; (* DEST is re/32 *) 

IF tempElP is not within code segment limit THEN #GP(0); FI; 
IF stack not large enough for a 4-byte return address 
THEN #SS(0); FI; 

Push(EIP); 

EIP tempElP; 

FI; 

IF OperandSIze = 16 
THEN 

tempElP ^ (EIP -r DEST) AND OOOOFFFFH; (* DEST is re/76*) 
IF tempElP is not within code segment limit THEN #GP(0); FI; 
IF stack not large enough for a 2-byte return address 
THEN #SS(0); FI; 

Push(IP); 

EIP tempElP; 

FI; 

ELSE (* Near absolute call *) 

IF OperandSIze = 64 
THEN 

tempRIP ^ DEST; (* DEST is r/m64 *) 

IF stack not large enough for a 8-byte return address 
THEN #SS(0); FI; 

Push(RIP); 

RIP tempRIP; 

FI; 

IF OperandSIze = 32 
THEN 

tempElP ^ DEST; (* DEST is r/m32 *) 

IF tempElP is not within code segment limit THEN #GP(0); FI; 
IF stack not large enough for a 4-byte return address 
THEN #SS(0); FI; 

Push(EIP); 

EIP tempElP; 

FI; 

IF OperandSIze = 16 
THEN 

tempElP ^ DEST AND OOOOFFFFH; (* DEST is r/m16*) 

IF tempElP is not within code segment limit THEN #GP(0); FI; 
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IF stack not large enough for a 2-byte return address 
THEN #SS(0); FI; 

Push(IP); 

EIP <- tempElP; 

FI; 

Fl;rel/abs 
FI; near 

IF far call and (PE = 0 or (PE = 1 and VM = 1)) (* Real-address or virtual-8086 mode *) 

THEN 

IF OperandSize = 32 
THEN 

IF stack not large enough for a 6-byte return address 
THEN #SS(0); FI; 

IF DEST[31:16] is not zero THEN #GP(0); FI; 

Push(CS); (* Padded with 16 hIgh-order bits *) 

Push(EIP); 

CS ^ DEST[47:32]; (* DEST Is ptrl6:32or [ml6:3Z] *) 

EIP ^ DEST[31:0]; (* DEST Is ptrl6:32or [ml6:32] *) 

ELSE (* OperandSize =16*) 

IF stack not large enough for a 4-byte return address 
THEN #SS(0); FI; 

Push(CS); 

Push(IP); 

CS ^ DEST[31:16]; (* DEST Is ptrl6:16or [ml6:16] *) 

EIP ^ DEST[15:0]; (* DEST Is ptrl6:16or [ml6:16]; clear upper 16 bits *) 

FI; 

FI; 

IF far call and (PE = 1 and VM = 0) (* Protected mode or IA-32e Mode, not vlrtual-8086 mode*) 
THEN 

IF segment selector in target operand NULL 
THEN #GP(0); FI; 

IF segment selector Index not within descriptor table limits 
THEN #GP(new code segment selector); FI; 

Read type and access rights of selected segment descriptor; 

IF IA32_EFER.LMA = 0 
THEN 

IF segment type is not a conforming or nonconforming code segment, call 
gate, task gate, or TSS 

THEN #GP(segment selector); FI; 

ELSE 

IF segment type Is not a conforming or nonconforming code segment or 
64-bit call gate, 

THEN #GP(segment selector); FI; 

FI; 

Depending on type and access rights: 

GO TO CONFORMING-CODE-SEGMENT; 

GO TO NONCONFORMING-CODE-SEGMENT; 

GO TO CALL-GATE; 

GO TO TASK-GATE; 

GO TO TASK-STATE-SEGMENT; 
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CONFORMING-CODE-SEGMENT: 

IF L bit = 1 and D bit = 1 and IA32_EFER.LMA = 1 
TFIEN GP(new code segment selector); FI; 

IFDPL>CPL 

TFIEN #GP(new code segment selector); FI; 

IF segment not present 

TFIEN #NP(new code segment selector); FI; 

IF stack not large enough for return address 
THEN #SS(0); FI; 
tempElP ^ DEST(Offset); 

IF OperandSIze = 16 
THEN 

tempElP ^ tempElP AND OOOOFFFFH; FI; (* Clear upper 16 bits *) 

IF (EFER.LMA = 0 or target mode = Compatibility mode) and (tempElP outside new code 
segment limit) 

THEN #CP(0); FI; 

IF tempElP is non-canonical 
THEN #CP(0); FI; 

IF OperandSIze = 32 
THEN 

Push(CS); (* Padded with 16 hIgh-order bits *) 

Push(EIP); 

CS DEST(CodeSegmentSelector); 

(* Segment descriptor information also loaded *) 

CS(RPL) ^ CPL; 

EIP tempElP; 

ELSE 

IF OperandSIze = 16 
THEN 

Push(CS); 

Push(IP); 

CS DEST(CodeSegmentSelector); 

(* Segment descriptor information also loaded *) 

CS(RPL) ^ CPL; 

EIP tempElP; 

ELSE (* OperandSIze = 64 *) 

Push(CS); (* Padded with 48 high-order bits *) 

Push(RIP); 

CS <- DEST(CodeSegmentSelector); 

(* Segment descriptor information also loaded *) 

CS(RPL) ^ CPL; 

RIP <- tempElP; 

FI; 

FI; 

END; 

NONCONFORMING-CODE-SEGMENT: 

IF L-BIt = 1 and D-BIT = 1 and IA32_EFER.LMA = 1 
THEN GP(new code segment selector); FI; 

IF(RPL>CPL) or (DPL^^ CPL) 

THEN #CP(new code segment selector); FI; 

IF segment not present 

THEN #NP(new code segment selector); FI; 

IF stack not large enough for return address 
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THEN #SS(0); FI; 
tempElP ^ DEST(Offset); 

IF OperandSize = 16 

THEN tempElP ^ tempElP AND OOOOFFFFH; FI; (* Clear upper 16 bits *) 

IF (EFER.LMA = 0 or target mode = Compatibility mode) and (tempElP outside new code 
segment limit) 

THEN #GP(0); FI; 

IF tempElP Is non-canonical 
THEN #GP(0); FI; 

IF OperandSize = 32 
THEN 

Push(CS); (* Padded with 16 high-order bits *) 

Push(EIP); 

CS <- DEST(CodeSegmentSelector); 

(* Segment descriptor information also loaded *) 

CS(RPL) ^ CPL; 

EIP <- tempElP; 

ELSE 

IF OperandSize = 16 
THEN 

Push(CS); 

Push(IP); 

CS <- DEST(CodeSegmentSelector); 

(* Segment descriptor information also loaded *) 

CS(RPL) ^ CPL; 

EIP <- tempElP; 

ELSE (* OperandSize = 64 *) 

Push(CS); (* Padded with 48 high-order bits *) 

Push(RIP); 

CS <- DEST(CodeSegmentSelector); 

(* Segment descriptor information also loaded *) 

CS(RPL) ^ CPL; 

RIP <- tempElP; 

FI; 

FI¬ 

END; 

CALL-GATE: 

IF call gate (DPL < CPL) or (RPL > DPL) 

THEN #GP(call-gate selector); FI; 

IF call gate not present 

THEN #NP(call-gate selector); FI; 

IF call-gate code-segment selector Is NULL 
THEN #GP(0); FI; 

IF call-gate code-segment selector Index Is outside descriptor table limits 
THEN #GP(call-gate code-segment selector); FI; 

Read call-gate code-segment descriptor; 

IF call-gate code-segment descriptor does not Indicate a code segment 
or call-gate code-segment descriptor DPL > CPL 
THEN #GP(call-gate code-segment selector); FI; 

IF IA32_EFER.LMA = 1 AND (call-gate code-segment descriptor is 

not a 64-bit code segment or call-gate code-segment descriptor has both L-bIt and D-bit set) 
THEN #GP(call-gate code-segment selector); FI; 

IF call-gate code segment not present 


CALL—Call Procedure 


Vol.2A 3-129 


INSTRUCTION SET REFERENCE, A-L 


THEN #NP(call-gate code-segment selector); FI; 

IF call-gate code segment is non-conforming and DPI < CPL 
THEN go to MORE-PRIVILEGE; 

ELSE go to SAME-PRIVILEGE; 

FI; 

END; 

MORE-PRIVILEGE: 

IF current TSS is 32-bit 
THEN 

TSSstackAddress <- (new code-segment DPL * 8) -r 4; 

IF (TSSstackAddress + S)> current TSS limit 
THEN #TS(current TSS selector); FI; 

NewSS <- 2 bytes loaded from (TSS base + TSSstackAddress -r 4); 

NewESP 4 bytes loaded from (TSS base + TSSstackAddress); 

ELSE 

IF current TSS is 16-bit 
THEN 

TSSstackAddress (new code-segment DPL * 4) -r 2 
IF (TSSstackAddress -r 3) > current TSS limit 
THEN #TS(current TSS selector); FI; 

NewSS 2 bytes loaded from (TSS base + TSSstackAddress -r 2); 

NewESP 2 bytes loaded from (TSS base + TSSstackAddress); 

ELSE (* current TSS is 64-bit *) 

TSSstackAddress (new code-segment DPL * 8) -r 4; 

IF (TSSstackAddress -r 7) > current TSS limit 
THEN #TS(current TSS selector); FI; 

NewSS new code-segment DPL; (* NULL selector with RPL = new CPL *) 
NewRSP 8 bytes loaded from (current TSS base + TSSstackAddress); 

FI; 

FI; 

IF IA32_EFER.LMA = 0 and NewSS is NULL 
THEN #TS(NewSS); FI; 

Read new code-segment descriptor and new stack-segment descriptor; 

IF IA32_EFER.LMA = 0 and (NewSS RPL new code-segment DPL 

or new stack-segment DPL ^ new code-segment DPL or new stack segment Is not a 

writable data segment) 

THEN #TS(NewSS); FI 

IF IA32_EFER.LMA = 0 and new stack segment not present 
THEN #SS(NewSS); FI; 

IF CallGateSIze = 32 
THEN 

IF new stack does not have room for parameters plus 16 bytes 
THEN #SS(NewSS); FI; 

IF CallGate(lnstructionPolnter) not within new code-segment limit 
THEN #GP(0); FI; 

SS <- newSS; (* Segment descriptor information also loaded *) 

ESP newESP; 

CS:EIP CallGate(CS:lnstructionPolnter); 

(* Segment descriptor information also loaded *) 

Push(oldSS:oldESP); (* From calling procedure *) 
temp <- parameter count from call gate, masked to 5 bits; 

Push(parameters from calling procedure's stack, temp) 

Push(oldCS:oldEIP); (* Return address to calling procedure *) 
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ELSE 

IF CallGateSize = 16 
THEN 

IF new stack does not have room for parameters plus 8 bytes 
THEN #SS(NewSS); FI; 

IF (CallGate(lnstructionPolnter) AND FFFFH) not In new code-segment limit 
THEN #GP(0); FI; 

SS <- newSS; (* Segment descriptor information also loaded *) 

ESP <- newESP; 

CS:IP CallGate(CS:lnstructionPolnter); 

(* Segment descriptor information also loaded *) 

Push(oldSS:oldESP); (* From calling procedure *) 
temp <- parameter count from call gate, masked to 5 bits; 
Push(parameters from calling procedure's stack, temp) 

Push(oldCS:oldEIP); (* Return address to calling procedure *) 

ELSE (* CallGateSize = 64 *) 

IF pushing 32 bytes on the stack would use a non-canonical address 
THEN #SS(NewSS); FI; 

IF (CallGate(lnstructionPolnter) is non-canonical) 

THEN #GP(0); FI; 

SS ^ NewSS; (* NewSS is NULL) 

RSP ^ NewESP; 

CS:IP CallGate(CS:lnstructionPointer); 

(* Segment descriptor information also loaded *) 

Push(oldSS:oldESP); (* From calling procedure *) 

Push(oldCS:oldEIP); (* Return address to calling procedure *) 

FI; 

FI; 

CPL <- CodeSegment(DPL) 

CS(RPL) ^ CPL 
END; 

SAME-PRIVILEGE: 

IF CallGateSize = 32 
THEN 

IF stack does not have room for 8 bytes 
THEN #SS(0); FI; 

IF CallGate(lnstructionPolnter) not within code segment limit 
THEN #GP(0); FI; 

CS:EIP <- CallGate(CS:EIP) (* Segment descriptor information also loaded *) 
Push(oldCS:oldEIP); (* Return address to calling procedure *) 

ELSE 

If CallGateSize = 16 
THEN 

IF stack does not have room for 4 bytes 
THEN #SS(0); FI; 

IF CallGate(lnstructionPolnter) not within code segment limit 
THEN #GP(0); FI; 

CS:IP CallGate(CS:instruction pointer); 

(* Segment descriptor information also loaded *) 

Push(oldCS:oldlP); (* Return address to calling procedure *) 

ELSE (* CallGateSize = 64) 

IF pushing 16 bytes on the stack touches non-canonical addresses 
THEN #SS(0); FI; 
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IF RIP non-canonical 
THEN #CP(0); FI; 

CS:IP CallGate(CS:instructlon pointer); 

(* Segment descriptor Information also loaded *) 

Push(oldCS:oldlP); (* Return address to calling procedure *) 
FI; 

FI; 

CS(RPL) ^ CPL 

END; 

TASK-CATE: 

IF task gate DPI < CPL or RPL 

THEN #GP(task gate selector); FI; 

IF task gate not present 

THEN #NP(task gate selector); FI; 

Read the TSS segment selector In the task-gate descriptor; 

IF TSS segment selector local/global bit is set to local 

or Index not within GDT limits 
THEN #CP(TSS selector); FI; 

Access TSS descriptor in GDT; 

IF TSS descriptor specifies that the TSS is busy (low-order 5 bits set to 00001) 
THEN #CP(TSS selector); FI; 

IF TSS not present 

THEN #NP(TSS selector); FI; 

SWITCH-TASKS (with nesting) to TSS; 

IF EIP not within code segment limit 
THEN #CP(0); FI; 

END; 

TASK-STATE-SECMENT: 

IF TSS DPL<CPL or RPL 

or TSS descriptor indicates TSS not available 
THEN #CP(TSS selector); FI; 

IF TSS Is not present 

THEN #NP(TSS selector); FI; 

SWITCH-TASKS (with nesting) to TSS; 

IF EIP not within code segment limit 
THEN #CP(0); FI; 

END; 


Flags Affected 

All flags are affected if a task switch occurs; no flags are affected if a task switch does not occur. 
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Protected Mode Exceptions 

#GP(0) If the target offset in destination operand is beyond the new code segment limit. 

If the segment selector in the destination operand is NULL. 

If the code segment selector in the gate is NULL. 

If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 
If the DS, ES, FS, or GS register is used to access memory and it contains a NULL segment 
selector. 


#GP(selector) 


#SS(0) 


#SS(selector) 


#NP(selector) 

#TS(selector) 


#PF(fault-code) 

#AC(0) 

#UD 


If a code segment or gate or TSS selector index is outside descriptor table limits. 

If the segment descriptor pointed to by the segment selector in the destination operand is not 
for a conforming-code segment, nonconforming-code segment, call gate, task gate, or task 
state segment. 

If the DPL for a nonconforming-code segment is not equal to the GPL or the RPL for the 
segment's segment selector is greater than the GPL. 

If the DPL for a conforming-code segment is greater than the GPL. 

If the DPL from a call-gate, task-gate, or TSS segment descriptor is less than the GPL or than 
the RPL of the call-gate, task-gate, or TSS's segment selector. 

If the segment descriptor for a segment selector from a call gate does not indicate it is a code 
segment. 

If the segment selector from a call gate is beyond the descriptor table limits. 

If the DPL for a code-segment obtained from a call gate is greater than the GPL. 

If the segment selector for a TSS has its local/global bit set for local. 

If a TSS segment descriptor specifies that the TSS is busy or not available. 

If pushing the return address, parameters, or stack segment pointer onto the stack exceeds 
the bounds of the stack segment, when no stack switch occurs. 

If a memory operand effective address is outside the SS segment limit. 

If pushing the return address, parameters, or stack segment pointer onto the stack exceeds 
the bounds of the stack segment, when a stack switch occurs. 

If the SS register is being loaded as part of a stack switch and the segment pointed to is 
marked not present. 

If stack segment does not have room for the return address, parameters, or stack segment 
pointer, when stack switch occurs. 

If a code segment, data segment, stack segment, call gate, task gate, or TSS is not present. 
If the new stack segment selector and ESP are beyond the end of the TSS. 

If the new stack segment selector is NULL. 

If the RPL of the new stack segment selector in the TSS is not equal to the DPL of the code 
segment being accessed. 

If DPL of the stack segment descriptor for the new stack segment is not equal to the DPL of the 
code segment descriptor. 

If the new stack segment is not a writable data segment. 

If segment-selector index for stack segment is outside descriptor table limits. 

If a page fault occurs. 

If alignment checking is enabled and an unaligned memory reference is made while the 
current privilege level is 3. 

If the LOGK prefix is used. 


Real-Address Mode Exceptions 

#GP If a memory operand effective address is outside the GS, DS, ES, FS, or GS segment limit. 

If the target offset is beyond the code segment limit. 

#UD If the LOGK prefix is used. 
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Virtual-SOSe Mode Exceptions 

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 


#PF(fault-code) 

#AC(0) 

#UD 

If the target offset is beyond the code segment limit. 

If a page fault occurs. 

If alignment checking is enabled and an unaligned memory reference is made. 

If the LOCK prefix is used. 


Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

#GP(selector) If a memory address accessed by the selector is in non-canonical space. 
#GP(0) If the target offset in the destination operand is non-canonical. 

e4-Bit Mode Exceptions 

#GP(0) If a memory address is non-canonical. 


#GP(selector) 

If target offset in destination operand is non-canonical. 

If the segment selector in the destination operand is NULL. 

If the code segment selector in the 64-bit gate is NULL. 

If code segment or 64-bit call gate is outside descriptor table limits. 

If code segment or 64-bit call gate overlaps non-canonical space. 

If the segment descriptor pointed to by the segment selector in the destination operand is not 
for a conforming-code segment, nonconforming-code segment, or 64-bit call gate. 

If the segment descriptor pointed to by the segment selector in the destination operand is a 
code segment and has both the D-bit and the L- bit set. 

If the DPL for a nonconforming-code segment is not equal to the CPL, or the RPL for the 
segment's segment selector is greater than the CPL. 

If the DPL for a conforming-code segment is greater than the CPL. 

If the DPL from a 64-bit call-gate is less than the CPL or than the RPL of the 64-bit call-gate. 
If the upper type field of a 64-bit call gate is not 0x0. 

If the segment selector from a 64-bit call gate is beyond the descriptor table limits. 

If the DPL for a code-segment obtained from a 64-bit call gate is greater than the CPL. 

If the code segment descriptor pointed to by the selector in the 64-bit gate doesn't have the L- 
bit set and the D-bit clear. 

#SS(0) 

If the segment descriptor for a segment selector from the 64-bit call gate does not indicate it 
is a code segment. 

If pushing the return offset or CS selector onto the stack exceeds the bounds of the stack 
segment when no stack switch occurs. 

If a memory operand effective address is outside the SS segment limit. 

If the stack address is in a non-canonical form. 

#SS(selector) 

If pushing the old values of SS selector, stack pointer, EFLAGS, CS selector, offset, or error 
code onto the stack violates the canonical boundary when a stack switch occurs. 

#NP(selector) 

#TS(selector) 

#UD 

If a code segment or 64-bit call gate is not present. 

If the load of the new RSP exceeds the limit of the TSS. 

(64-bit mode only) If a far call is direct to an absolute address in memory. 

If the LOCK prefix is used. 

#PF(fault-code) 

#AC(0) 

If a page fault occurs. 

If alignment checking is enabled and an unaligned memory reference is made while the 
current privilege level is 3. 
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CBW/CWDE/CDQE—Convert Byte to Word/Convert Word to Doubleword/Convert Doubleword to 
Quadword 


Opcode 

Instruction 

Op/ 

En 

64-bit 

Mode 

Compat/ 
Leg Mode 

Description 

98 

CBW 

NP 

Valid 

Valid 

AX <- sign-extend of AL. 

98 

CWDE 

NP 

Valid 

Valid 

EAX <- sign-extend of AX. 

REX.W + 98 

CDQE 

NP 

Valid 

N.E. 

RAX <- sign-extend of EAX. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

NP 

NA 

NA 

NA 

NA 


Description 

Double the size of the source operand by means of sign extension. The CBW (convert byte to word) instruction 
copies the sign (bit 7) in the source operand into every bit in the AH register. The CWDE (convert word to double- 
word) instruction copies the sign (bit 15) of the word in the AX register into the high 16 bits of the EAX register. 

CBW and CWDE reference the same opcode. The CBW instruction is intended for use when the operand-size attri¬ 
bute is 16; CWDE is intended for use when the operand-size attribute is 32. Some assemblers may force the 
operand size. Others may treat these two mnemonics as synonyms (CBW/CWDE) and use the setting of the 
operand-size attribute to determine the size of values to be converted. 

In 64-bit mode, the default operation size is the size of the destination register. Use of the REX.W prefix promotes 
this instruction (CDQE when promoted) to operate on 64-bit operands. In which case, CDQE copies the sign (bit 
31) of the doubleword in the EAX register into the high 32 bits of RAX. 

Operation 

IF OperandSIze = 16 (* Instruction = CBW *) 

THEN 

AX ^ SlgnExtend(AL); 

ELSE IF (OperandSIze = 32, Instruction = CWDE) 

EAX ^ SignExtend(AX); FI; 

ELSE (* 64-Blt Mode, OperandSIze = 64, Instruction = CDQE*) 

RAX ^ SignExtend(EAX); 

FI; 

Flags Affected 

None. 

Exceptions (All Operating Modes) 

#UD If the LOCK prefix is used. 


CBW/CWDE/CDQE—Convert Byte to Word/Convert Word to Doubleword/Convert Doubleword to Quadword 
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CLAC-Clear AC Flag in EFLAGS Reg 

ister 

Opcode/ 

Instruction 

Op/ 

En 

64/32 bit 

Mode 

Support 

CPUID 

Feature 

Flag 

Description 

OF 01 CA 

CLAC 

NP 

V/V 

SNAP 

Clear the AC flag in the EFLAGS register. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

NP 

NA 

NA 

NA 

NA 


Description 

Clears the AC flag bit in EFLAGS register. This disables any alignment checking of user-mode data accesses. Ifthe 
SNAP bit is set in the CR4 register, this disallows explicit supervisor-mode data accesses to user-mode pages. 

This instruction's operation is the same in non-64-bit modes and 64-bit mode. Attempts to execute CLAC when 
CPL > 0 cause #UD. 

Operation 

EFLAGS.AC ^ 0; 

Flags Affected 

AC cleared. Other flags are unaffected. 

Protected Mode Exceptions 

#UD If the LOCK prefix is used. 

Ifthe CPL > 0. 

If CPUID.(EAX=07H, ECX=OH):EBX.SMAP[bit 20] = 0. 

Real-Address Mode Exceptions 

#UD If the LOCK prefix is used. 

If CPUID.(EAX=07H, ECX=OH):EBX.SMAP[bit 20] = 0. 

Virtual-SOSe Mode Exceptions 

#UD The CLAC instruction is not recognized in virtual-8086 mode. 

Compatibility Mode Exceptions 

#UD If the LOCK prefix is used. 

Ifthe CPL > 0. 

If CPUID.(EAX=07H, ECX=0H):EBX.SMAP[bit 20] = 0. 

64-Bit Mode Exceptions 

#UD If the LOCK prefix is used. 

Ifthe CPL > 0. 

If CPUID.(EAX=07H, ECX=0H):EBX.SMAP[bit 20] = 0. 
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CLC—Clear Carry Flag 


Opcode 

Instruction 

Op/ 

En 

64-bit 

Mode 

Compat/ 
Leg Mode 

Description 

F8 

CLC 

NP 

Valid 

Valid 

Clear CF flag. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

NP 

NA 

NA 

NA 

NA 


Description 

Clears the CF flag in the EFLAGS register. Operation is the same in all modes. 


Operation 

CF^O; 

Flags Affected 

The CF flag is set to 0. The OF, ZF, SF, AF, and PF flags are unaffected. 

Exceptions (All Operating Modes) 

#UD If the LOCK prefix is used. 


CLC—Clear Carry Flag 
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CLD—Clear Direction Flag 


Opcode 

Instruction 

Op/ 

En 

64-bit 

Mode 

Compat/ 
Leg Mode 

Description 

FC 

CLD 

NP 

Valid 

Valid 

Clear DF flag. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

NP 

NA 

NA 

NA 

NA 


Description 

Clears the DF flag in the EFLAGS register. When the DF flag is set to 0, string operations increment the index regis¬ 
ters (ESI and/or EDI). Operation is the same in all modes. 


Operation 

DF^O; 

Flags Affected 

The DF flag is set to 0. The CF, OF, ZF, SF, AF, and PF flags are unaffected. 

Exceptions (All Operating Modes) 

#UD If the LOCK prefix is used. 
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CLFLUSH—Flush Cache Line 


Opcode 

Instruction 

Op/ 

En 

64-bit 

Mode 

Compat/ 
Leg Mode 

Description 

OF AE n 

CLFLUSH mS 

M 

Valid 

Valid 

Flushes cache line containing mS. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

M 

ModRM:r/m (w) 

NA 

NA 

NA 


Description 

Invalidates from every level of the cache hierarchy in the cache coherence domain the cache line that contains the 
linear address specified with the memory operand. If that cache line contains modified data at any level of the 
cache hierarchy, that data is written back to memory. The source operand is a byte memory location. 

The availability of CLFLUSH is indicated by the presence of the CPUID feature flag CLFSH 

(CPUID.01H:EDX[bit 19]). The aligned cache line size affected is also indicated with the CPUID instruction (bits 8 
through 15 of the EBX register when the initial value in the EAX register is 1). 

The memory attribute of the page containing the affected line has no effect on the behavior of this instruction. It 
should be noted that processors are free to speculatively fetch and cache data from system memory regions 
assigned a memory-type allowing for speculative reads (such as, the WB, WC, and WT memory types). PREFETCHh 
instructions can be used to provide the processor with hints for this speculative behavior. Because this speculative 
fetching can occur at any time and is not tied to instruction execution, the CLFLUSH instruction is not ordered with 
respect to PREFETCHh instructions or any of the speculative fetching mechanisms (that is, data can be specula¬ 
tively loaded into a cache line just before, during, or after the execution of a CLFLUSH instruction that references 
the cache line). 

Executions of the CLFLUSH instruction are ordered with respect to each other and with respect to writes, locked 
read-modify-write instructions, fence instructions, and executions of CLFLUSHOPT to the same cache line.^ They 
are not ordered with respect to executions of CLFLUSHOPT to different cache lines. 

The CLFLUSH instruction can be used at all privilege levels and is subject to all permission checking and faults asso¬ 
ciated with a byte load (and in addition, a CLFLUSH instruction is allowed to flush a linear address in an execute- 
only segment). Like a load, the CLFLUSH instruction sets the A bit but not the D bit in the page tables. 

In some implementations, the CLFLUSH instruction may always cause transactional abort with Transactional 
Synchronization Extensions (TSX). The CLFLUSH instruction is not expected to be commonly used inside typical 
transactional regions. However, programmers must not rely on CLFLUSH instruction to force a transactional abort, 
since whether they cause transactional abort is implementation dependent. 

The CLFLUSH instruction was introduced with the SSE2 extensions; however, because it has its own CPUID feature 
flag, it can be implemented in IA-32 processors that do not include the SSE2 extensions. Also, detecting the pres¬ 
ence of the SSE2 extensions with the CPUID instruction does not guarantee that the CLFLUSH instruction is imple¬ 
mented in the processor. 

CLFLUSH operation is the same in non-64-bit modes and 64-bit mode. 

Operation 

Flush_Cache_Line(SRC); 

Intel C/C++ Compiler Intrinsic Equivalents 

CLFLUSH: void _mm_clflush(void const *p) 


1. Earlier versions of this manual specified that executions of the CLFLUSH instruction were ordered only by the MFENCE instruction. 
All processors implementing the CLFLUSH instruction also order it relative to the other operations enumerated above. 


CLFLUSH-Flush Cache Line 
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Protected Mode Exceptions 

#GP(0) For an illegal memory operand effective address in the CS, DS, ES, FS or GS segments. 

#SS(0) For an illegal address in the SS segment. 

#PF(fault-code) For a page fault. 

#UD If CPUID.01H:EDX.CLFSH[bit 19] = 0. 

If the LOCK prefix is used. 

If an instruction prefix F2FI or F3FI is used. 

Real-Address Mode Exceptions 

#GP If any part of the operand lies outside the effective address space from 0 to FFFFFI. 

#UD If CPUID.01H:EDX.CLFSH[bit 19] = 0. 

If the LOCK prefix is used. 

If an instruction prefix F2FI or F3FI is used. 

Virtual-SOSe Mode Exceptions 

Same exceptions as in real address mode. 

#PF(fault-code) For a page fault. 

Compatibility Mode Exceptions 

Same exceptions as in protected mode. 


e4-Bit Mode Exceptions 


#SS(0) 

#GP(0) 

#PF(fault-code) 

#UD 


If a memory address referencing the SS segment is in a non-canonical form. 
If the memory address is in a non-canonical form. 

For a page fault. 

If CPUID.01H:EDX.CLFSH[bit 19] = 0. 

If the LOCK prefix is used. 

If an instruction prefix F2FI or F3FI is used. 
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CLFLUSHOPT—Flush Cache Line Optimized 


Opcode 

Instruction 

Op/ 

En 

64-bit 

Mode 

Compat/ 
Leg Mode 

Description 

66 OF AE n 

CLFLUSHOPT m8 

M 

Valid 

Valid 

Flushes cache line containing mS. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

M 

ModRM:r/m (w) 

NA 

NA 

NA 


Description 

Invalidates from every level of the cache hierarchy in the cache coherence domain the cache line that contains the 
linear address specified with the memory operand. If that cache line contains modified data at any level of the 
cache hierarchy, that data is written back to memory. The source operand is a byte memory location. 

The availability of CLFLUSHOPT is indicated by the presence of the CPUID feature flag CLFLUSHOPT 
(CPUID.(EAX=7,ECX=0):EBX[bit 23]). The aligned cache line size affected is also indicated with the CPUID instruc¬ 
tion (bits 8 through 15 of the EBX register when the initial value in the EAX register is 1). 

The memory attribute of the page containing the affected line has no effect on the behavior of this instruction. It 
should be noted that processors are free to speculatively fetch and cache data from system memory regions 
assigned a memory-type allowing for speculative reads (such as, the WB, WC, and WT memory types). PREFETCHh 
instructions can be used to provide the processor with hints for this speculative behavior. Because this speculative 
fetching can occur at any time and is not tied to instruction execution, the CLFLUSH instruction is not ordered with 
respect to PREFETCHh instructions or any of the speculative fetching mechanisms (that is, data can be specula¬ 
tively loaded into a cache line just before, during, or after the execution of a CLFLUSH instruction that references 
the cache line). 

Executions of the CLFLUSHOPT instruction are ordered with respect to fence instructions and to locked read- 
modify-write instructions; they are also ordered with respect to the following accesses to the cache line being 
invalidated: writes, executions of CLFLUSH, and executions of CLFLUSHOPT. They are not ordered with respect to 
writes, executions of CLFLUSH, or executions of CLFLUSHOPT that access other cache lines; to enforce ordering 
with such an operation, software can insert an SFENCE instruction between CFLUSHOPT and that operation. 

The CLFLUSHOPT instruction can be used at all privilege levels and is subject to all permission checking and faults 
associated with a byte load (and in addition, a CLFLUSHOPT instruction is allowed to flush a linear address in an 
execute-only segment). Like a load, the CLFLUSHOPT instruction sets the A bit but not the D bit in the page tables. 

In some implementations, the CLFLUSHOPT instruction may always cause transactional abort with Transactional 
Synchronization Extensions (TSX). The CLFLUSHOPT instruction is not expected to be commonly used inside 
typical transactional regions. However, programmers must not rely on CLFLUSHOPT instruction to force a transac¬ 
tional abort, since whether they cause transactional abort is implementation dependent. 

CLFLUSHOPT operation is the same in non-64-bit modes and 64-bit mode. 

Operation 

Flush_Cache_Line_Optimized(SRC); 

Intel C/C++ Compiler Intrinsic Equivalents 

CLFLUSHOPT:void _mm_clflushopt(void const *p) 


CLFLUSHOPT—Flush Cache Line Optimized 
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Protected Mode Exceptions 

#GP(0) For an illegal memory operand effective address in the CS, DS, ES, FS or GS segments. 

#SS(0) For an illegal address in the SS segment. 

#PF(fault-code) For a page fault. 

#UD If CPUID.(EAX=7,ECX=0):EBX.CLFLUSHOPT[bit 23] = 0. 

If the LOCK prefix is used. 

If an instruction prefix F2FI or F3FI is used. 

Real-Address Mode Exceptions 

#GP If any part of the operand lies outside the effective address space from 0 to FFFFFI. 

#UD If CPUID.(EAX=7,ECX=0):EBX.CLFLUSHOPT[bit 23] = 0. 

If the LOCK prefix is used. 

If an instruction prefix F2FI or F3FI is used. 

Virtual-SOSe Mode Exceptions 

Same exceptions as in real address mode. 

#PF(fault-code) For a page fault. 

Compatibility Mode Exceptions 

Same exceptions as in protected mode. 


e4-Bit Mode Exceptions 


#SS(0) 

#GP(0) 

#PF(fault-code) 

#UD 


If a memory address referencing the SS segment is in a non-canonical form. 
If the memory address is in a non-canonical form. 

For a page fault. 

If CPUID.(EAX=7,ECX=0):EBX.CLFLUSHOPT[bit 23] = 0. 

If the LOCK prefix is used. 

If an instruction prefix F2FI or F3FI is used. 
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CLI — Clear Interrupt Flag 


Opcode 

Instruction 

Op/ 

En 

64-bit 

Mode 

Compat/ 
Leg Mode 

Description 

FA 

CLI 

NP 

Valid 

Valid 

Clear interrupt flag; Interrupts disabled when 
Interrupt flag cleared. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

NP 

NA 

NA 

NA 

NA 


Description 

If protected-mode virtual interrupts are not enabled, CLI clears the IF flag in the EFLAGS register. No other flags 
are affected. Clearing the IF flag causes the processor to ignore maskable external interrupts. The IF flag and the 
CLI and STI instruction have no affect on the generation of exceptions and NMI interrupts. 

When protected-mode virtual interrupts are enabled, CPL is 3, and lOPL is less than 3; CLI clears the VIF flag in the 
EFLAGS register, leaving IF unaffected. Table 3-7 indicates the action of the CLI instruction depending on the 
processor operating mode and the CPL/IOPL of the running program or procedure. 

Operation is the same in all modes. 


Table 3-7. Decision Table for CLI Results 


PE 

VM 

lOPL 

CPL 

PVI 

VIP 

VME 

CLI Result 

0 

X 

X 

X 

X 

X 

X 

IF = 0 

1 

0 

>CPL 

X 

X 

X 

X 

O 

II 

UL 

1 

0 

<CPL 

3 

1 

X 

X 

o 

II 

u. 

> 

1 

0 

<CPL 

< 3 

X 

X 

X 

GP Fault 

1 

0 

<CPL 

X 

0 

X 

X 

GP Fault 

1 

1 

3 

X 

X 

X 

X 

o 

II 

UL 

1 

1 

< 3 

X 

X 

X 

1 

o 

II 

u. 

> 

1 

1 

< 3 

X 

X 

X 

0 

GP Fault 


NOTES: 

* X = This setting has no impact. 


Operation 

IFPE = 0 
THEN 

IF <- 0; (* Reset Interrupt Flag *) 

ELSE 

IF VM = 0; 

THEN 

IFIOPL>CPL 

THEN 

IF <- 0; (* Reset Interrupt Flag *) 

ELSE 

IF ((lOPL < CPL) and (CPL = 3) and (PVI = 1)) 

THEN 

VIF <- 0; (* Reset Virtual Interrupt Flag *) 

ELSE 

#GP(0); 


CLI — Clear Interrupt Flag 
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FI; 

FI; 

ELSE (* VM = 1 *) 

IF lOPL = 3 
THEN 

IF <- 0; (* Reset Interrupt Flag *) 

ELSE 

IF (IOPL< 3) AND(VME = 1) 

THEN 

VIF 0; (* Reset Virtual Interrupt Flag *) 

ELSE 

#GP(0); 

FI; 

FI; 

FI; 

FI; 

Flags Affected 

If protected-mode virtual interrupts are not enabled, IF is set to 0 if the CPL is equal to or less than the lOPL; other¬ 
wise, it is not affected. Other flags are unaffected. 

When protected-mode virtual interrupts are enabled, CPL is 3, and lOPL is less than 3; CLI clears the VIF flag in the 
EFLAGS register, leaving IF unaffected. Other flags are unaffected. 

Protected Mode Exceptions 

#GP(0) If the CPL is greater (has less privilege) than the lOPL of the current program or procedure. 

#UD If the LOCK prefix is used. 

Real-Address Mode Exceptions 

#UD If the LOCK prefix is used. 

Virtual-SOSe Mode Exceptions 

#GP(0) If the CPL is greater (has less privilege) than the lOPL of the current program or procedure. 

#UD If the LOCK prefix is used. 

Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

e4-Bit Mode Exceptions 

#GP(0) If the CPL is greater (has less privilege) than the lOPL of the current program or procedure. 

#UD If the LOCK prefix is used. 
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CLTS—Clear Task-Switched Flag in CRO 


Opcode 

Instruction 

Op/ 

En 

64-bit 

Mode 

Compat/ 
Leg Mode 

Description 

OF 06 

CLTS 

NP 

Valid 

Valid 

Clears TS flag in CRO. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

NP 

NA 

NA 

NA 

NA 


Description 

Clears the task-switched (TS) flag in the CRO register. This instruction is intended for use in operating-system 
procedures. It is a privileged instruction that can only be executed at a CPL of 0. It is allowed to be executed in real- 
address mode to allow initialization for protected mode. 

The processor sets the TS flag every time a task switch occurs. The flag is used to synchronize the saving of FPU 
context in multitasking applications. See the description of the TS flag in the section titled "Control Registers" in 
Chapter 2 of the I ntel® 64 and IA-32 Architectures Software Developer's Manual, Volume 3A, for more information 
about this flag. 

CLTS operation is the same in non-64-bit modes and 64-bit mode. 

See Chapter 25, "VMX Non-Root Operation," of the Intel® 64 and IA-32 Architectures Software Developer's 
Manual, Volume 3C, for more information about the behavior of this instruction in VMX non-root operation. 

Operation 

CR0.TS[bit 3] ^ 0; 

Flags Affected 

The TS flag in CRO register is cleared. 

Protected Mode Exceptions 

#GP(0) If the current privilege level is not 0. 

#UD If the LOCK prefix is used. 

Real-Address Mode Exceptions 

#UD If the LOCK prefix is used. 

\/irtual-8086 Mode Exceptions 

#GP(0) CLTS is not recognized in virtual-8086 mode. 

#UD If the LOCK prefix is used. 

Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

e4-Bit Mode Exceptions 

#GP(0) If the CPL is greater than 0. 

#UD If the LOCK prefix is used. 


CLTS—Clear Task-Switched Flag in CRO 
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CLWB—Cache Line Write Back 


Opcode/ 

Instruction 

Op/ 

En 

64/32 bit 

Mode 

Support 

CPUID 

Feature Flag 

Description 

66 OF AE /6 

CLWB mB 

M 

V/V 

CLWB 

Writes back modified cache line containing m8, and may 
retain the line in cache hierarchy in non-modified state. 


Instruction Operand Encoding^ 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

M 

ModRM:r/m (w) 

NA 

NA 

NA 


Description 

Writes back to memory the cache line (if modified) that contains the linear address specified with the memory 
operand from any level of the cache hierarchy in the cache coherence domain. The line may be retained in the 
cache hierarchy in non-modified state. Retaining the line in the cache hierarchy is a performance optimization 
(treated as a hint by hardware) to reduce the possibility of cache miss on a subsequent access. Hardware may 
choose to retain the line at any of the levels in the cache hierarchy, and in some cases, may invalidate the line from 
the cache hierarchy. The source operand is a byte memory location. 

The availability of CLWB instruction is indicated by the presence of the CPUID feature flag CLWB (bit 24 of the EBX 
register, see "CPUID — CPU Identification" in this chapter). The aligned cache line size affected is also indicated 
with the CPUID instruction (bits 8 through 15 of the EBX register when the initial value in the EAX register is 1). 

The memory attribute of the page containing the affected line has no effect on the behavior of this instruction. It 
should be noted that processors are free to speculatively fetch and cache data from system memory regions that 
are assigned a memory-type allowing for speculative reads (such as, the WB, WC, and WT memory types). 
PREFETCHh instructions can be used to provide the processor with hints for this speculative behavior. Because this 
speculative fetching can occur at any time and is not tied to instruction execution, the CLWB instruction is not 
ordered with respect to PREFETCHh instructions or any of the speculative fetching mechanisms (that is, data can 
be speculatively loaded into a cache line just before, during, or after the execution of a CLWB instruction that refer¬ 
ences the cache line). 

CLWB instruction is ordered only by store-fencing operations. For example, software can use an SFENCE, MFENCE, 
XCHG, or LOCK-prefixed instructions to ensure that previous stores are included in the write-back. CLWB instruc¬ 
tion need not be ordered by another CLWB or CLFLUSHOPT instruction. CLWB is implicitly ordered with older stores 
executed by the logical processor to the same address. 

For usages that require only writing back modified data from cache lines to memory (do not require the line to be 
invalidated), and expect to subsequently access the data, software is recommended to use CLWB (with appropriate 
fencing) instead of CLFLUSH or CLFLUSHOPT for improved performance. 

The CLWB instruction can be used at all privilege levels and is subject to all permission checking and faults associ¬ 
ated with a byte load. Like a load, the CLWB instruction sets the accessed flag but not the dirty flag in the page 
tables. 

In some implementations, the CLWB instruction may always cause transactional abort with Transactional Synchro¬ 
nization Extensions (TSX). CLWB instruction is not expected to be commonly used inside typical transactional 
regions. However, programmers must not rely on CLWB instruction to force a transactional abort, since whether 
they cause transactional abort is implementation dependent. 

Operation 

Cache_Line_Write_Back(m8); 

Flags Affected 

None. 


1. ModRM.MOD != OllB 
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C/C++ Compiler Intrinsic Equivalent 

CLWB void _mm_clwb(void const *p); 

Protected Mode Exceptions 

#UD If the LOCK prefix is used. 

If CPUID.(EAX=07H, ECX=OH):EBX.CLWB[bit 24] = 0. 

#GP(0) For an illegal memory operand effective address in the CS, DS, ES, FS or GS segments. 

#SS(0) For an illegal address in the SS segment. 

#PF(fault-code) For a page fault. 

Real-Address Mode Exceptions 

#UD If the LOCK prefix is used. 

If CPUID.(EAX=07H, ECX=OH):EBX.CLWB[bit 24] = 0. 

#GP If any part of the operand lies outside the effective address space from 0 to FFFFFI. 

\/irtual-8086 Mode Exceptions 

Same exceptions as in real address mode. 

#PF(fault-code) For a page fault. 

Compatibility Mode Exceptions 

Same exceptions as in protected mode. 


64-Bit Mode Exceptions 


#UD 

#SS(0) 

#GP(0) 

#PF(fault-code) 


If the LOCK prefix is used. 

If CPUID.(EAX=07H, ECX=OH):EBX.CLWB[bit 24] = 0. 
If a memory address referencing the SS segment is in a 
If the memory address is in a non-canonical form. 

For a page fault. 


non-canonical form. 


CLWB—Cache Line Write Back 
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CMC—Complement Carry Flag 


Opcode 

Instruction 

Op/ 

En 

64-bit 

Mode 

Compat/ 
Leg Mode 

Description 

F5 

CMC 

NP 

Valid 

Valid 

Complement CF flag. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

NP 

NA 

NA 

NA 

NA 


Description 

Complements the CF flag in the EFLAGS register. CMC operation is the same in non-64-bit modes and 64-bit mode. 


Operation 

EFLAGS.CF[bit 0]^ NOT EFLAGS.CF[bit 0]; 

Flags Affected 

The CF flag contains the complement of its original value. The OF, ZF, SF, AF, and PF flags are unaffected. 

Exceptions (All Operating Modes) 

#UD If the LOCK prefix is used. 
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CMOVcc—Conditional Move 


Opcode 

Instruction 

Op/ 

En 

64-Bit 

Mode 

Compat/ 
Ceg Mode 

Description 

OF 47 /r 

CMOVA r7 6, r/m 76 

RM 

Valid 

Valid 

Move if above (CF=0 and ZF=0). 

OF 47 /r 

CMOVA r32, r/m3Z 

RM 

Valid 

Valid 

Move if above (CF=0 and ZF=0). 

REX.W + OF 47 /r 

CMOVA r64, r/m64 

RM 

Valid 

N.E. 

Move if above (CF=0 and ZF=0). 

OF 43 /r 

CM0VAEr76, r/m 76 

RM 

Valid 

Valid 

Move if above or equal (CF=0). 

OF 43 /r 

CMOVAE r32, r/m32 

RM 

Valid 

Valid 

Move if above or equal (CF=0). 

REX.W + OF 43 /r 

CMOVAE r64, r/m64 

RM 

Valid 

N.E. 

Move if above or equal (CF=0). 

OF 42 /r 

CM0VBr76, r/m 76 

RM 

Valid 

Valid 

Move if below (CF=1). 

OF 42 /r 

CMOVB r32, rlm3Z 

RM 

Valid 

Valid 

Move if below (CF=1). 

REX.W + OF 42 /r 

CMOVB r64, r/m64 

RM 

Valid 

N.E. 

Move if below (CF=1). 

OF 46 /r 

CM0VBEr76, r/m 76 

RM 

Valid 

Valid 

Move if below or equal (CF=1 or ZF=1). 

OF 46 /r 

CMOVBE r3Z, r/m3Z 

RM 

Valid 

Valid 

Move if below or equal (CF=1 or ZF=1). 

REX.W + OF 46 /r 

CMOVBE r64, r/m64 

RM 

Valid 

N.E. 

Move if below or equal (CF=1 or ZF=1). 

OF 42 /r 

CM0VCr76, r/m 76 

RM 

Valid 

Valid 

Move if carry (CF=1). 

OF 42 /r 

CMOVC r3Z, r/m3Z 

RM 

Valid 

Valid 

Move if carry (CF=1). 

REX.W + OF 42 /r 

CMOVC r64, r/m64 

RM 

Valid 

N.E. 

Move if carry (CF=1). 

OF 44 /r 

CM0VEr7 6, r/m 7 6 

RM 

Valid 

Valid 

Move if equal (ZF=1). 

OF 44 /r 

CMOVE r3Z, r/m3Z 

RM 

Valid 

Valid 

Move if equal (ZF=1). 

REX.W + OF 44 /r 

CMOVE r64, r/m64 

RM 

Valid 

N.E. 

Move if equal (ZF=1). 

OF 4F /r 

CMOVC r76, r/m 76 

RM 

Valid 

Valid 

Move if greater (ZF=0 and SF=OF). 

OF 4F /r 

CMOVC r3Z, r/m3Z 

RM 

Valid 

Valid 

Move if greater (ZF=0 and SF=OF). 

REX.W + OF 4F /r 

CMOVC r64, r/m64 

RM 

V/N.E. 

NA 

Move if greater (ZF=0 and SF=OF). 

OF 40 /r 

CM0VCEr76, r/m 7 6 

RM 

Valid 

Valid 

Move if greater or equal (SF=OF). 

OF 40 /r 

CMOVCE r3Z, r/m3Z 

RM 

Valid 

Valid 

Move if greater or equal (SF=OF). 

REX.W + OF 40 /r 

CMOVCE r64, r/m64 

RM 

Valid 

N.E. 

Move if greater or equal (SF=OF). 

OF 4C /r 

CMOVC r7 6, r/m 7 6 

RM 

Valid 

Valid 

Move if less (SF?^ OF). 

OF 4C /r 

CMOVC r3Z, r/m3Z 

RM 

Valid 

Valid 

Move if less (SF?^ OF). 

REX.W + OF 4C /r 

CMOVC r64, r/m64 

RM 

Valid 

N.E. 

Move if less (SF?^: OF). 

OF 4E /r 

CMOVCE r7 6, r/m 7 6 

RM 

Valid 

Valid 

Move if less or equal (ZF=1 or SF^S: OF). 

OF 4E /r 

CMOVCE r3Z, r/m3Z 

RM 

Valid 

Valid 

Move if less or equal (ZF=1 or SF?i: OF). 

REX.W + OF 4E /r 

CMOVCE r64, r/m64 

RM 

Valid 

N.E. 

Move if less or equal (ZF=1 or SF?i: OF). 

OF 46 /r 

CMOVNAr76,r/m76 

RM 

Valid 

Valid 

Move if not above (CF=1 orZF=1). 

OF 46 /r 

CMOVNA r3Z, r/m3Z 

RM 

Valid 

Valid 

Move if not above (CF=1 orZF=1). 

REX.W + OF 46 /r 

CMOVNA r64, r/m64 

RM 

Valid 

N.E. 

Move if not above (CF= 1 or ZF=1). 

OF 42 /r 

CMOVNAEr76,r/m76 

RM 

Valid 

Valid 

Move if not above or equal (CF=1). 

OF 42 /r 

CMOVNAE r3Z, r/m3Z 

RM 

Valid 

Valid 

Move if not above or equal (CF=1). 

REX.W + OF 42 /r 

CMOVNAE r64, r/m64 

RM 

Valid 

N.E. 

Move if not above or equal (CF=1). 

OF 43 /r 

CMOVNBr76, r/m76 

RM 

Valid 

Valid 

Move if not below (CF=0). 

OF 43 /r 

CMOVNB r3Z, r/m3Z 

RM 

Valid 

Valid 

Move if not below (CF=0). 

REX.W + OF 43 /r 

CMOVNB r64, r/m64 

RM 

Valid 

N.E. 

Move if not below (CF=0). 

OF 47 /r 

CMOVNBEr76,r/m76 

RM 

Valid 

Valid 

Move if not below or equal (CF=0 and ZF=0). 
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Opcode 

Instruction 

Op/ 

Gn 

64-Bit 

Mode 

Compat/ 
Ceg Mode 

Description 

OF 47 /r 

CMOVNBE r32, r/m32 

RM 

Valid 

Valid 

Move if not below or equal (CF=0 and ZF=0). 

REX.W + OF 47 /r 

CMOVNBE r64, r/m64 

RM 

Valid 

N.E. 

Move If not below or equal (CF=0 and ZF=0). 

OF 43 /r 

CMOmC r16, r/m 16 

RM 

Valid 

Valid 

Move If not carry (CF=0). 

OF 43 /r 

CMOVNC r32, r/m32 

RM 

Valid 

Valid 

Move If not carry (CF=0). 

REX.W + OF 43 /r 

CMOVNC r64, r/m64 

RM 

Valid 

N.E. 

Move If not carry (CF=0). 

OF 45 /r 

CMOVNE r76, r/m 76 

RM 

Valid 

Valid 

Move If not equal (ZF=0). 

OF 45 /r 

CMOVNE r32, r/m32 

RM 

Valid 

Valid 

Move If not equal (ZF=0). 

REX.W + OF 45 /r 

CMOVNE r64, r/m64 

RM 

Valid 

N.E. 

Move if not equal (ZF=0). 

OF 4E /r 

CMOVNC r76, r/m 76 

RM 

Valid 

Valid 

Move If not greater (ZF=1 or SF?i: OF). 

OF 4E /r 

CMOVNC r32, r/m32 

RM 

Valid 

Valid 

Move If not greater (ZF=1 or SF?i: OF). 

REX.W + OF 4E /r 

CMOVNC r64, r/m64 

RM 

Valid 

N.E. 

Move If not greater (ZF=1 or SF?i: OF). 

OF 4C /r 

CM0VNCEr76, r/m 76 

RM 

Valid 

Valid 

Move If not greater or equal (SF?!: OF). 

OF 4C /r 

CMOVNCE r32, r/m32 

RM 

Valid 

Valid 

Move If not greater or equal (SF^s: OF). 

REX.W + OF 4C /r 

CMOVNCE r64, r/m64 

RM 

Valid 

N.E. 

Move If not greater or equal (SF?i: OF). 

OF 40 /r 

CM0VNLr76, r/m 7 6 

RM 

Valid 

Valid 

Move if not less (SF=OF). 

OF 40 /r 

CMOVNL r32, r/m32 

RM 

Valid 

Valid 

Move if not less (SF=OF). 

REX.W + OF 40 /r 

CMOVNL r64, r/m64 

RM 

Valid 

N.E. 

Move if not less (SF=OF). 

OF 4F /r 

CMOVNCE r7 6, r/m 7 6 

RM 

Valid 

Valid 

Move If not less or equal (ZF=0 and SF=OF). 

OF 4F /r 

CMOVNCE r32, r/m32 

RM 

Valid 

Valid 

Move if not less or equal (ZF=0 and SF=OF). 

REX.W + OF 4F /r 

CMOVNCE r64, r/m64 

RM 

Valid 

N.E. 

Move If not less or equal (ZF=0 and SF=OF). 

OF 41 /r 

CMOVNOr76,r/m76 

RM 

Valid 

Valid 

Move If not overflow (OF=0). 

OF 41 /r 

CMOVNO r32, r/m32 

RM 

Valid 

Valid 

Move If not overflow (OF=0). 

REX.W + OF 41 /r 

CMOVNO r64, r/m64 

RM 

Valid 

N.E. 

Move If not overflow (0F=0). 

OF 4B /r 

CMOVNPr76,r/m76 

RM 

Valid 

Valid 

Move if not parity (PF=0). 

OF 4B /r 

CMOVNP r32, r/m32 

RM 

Valid 

Valid 

Move if not parity (PF=0). 

REX.W + OF 4B /r 

CMOVNP r64, r/m64 

RM 

Valid 

N.E. 

Move if not parity (PF=0). 

OF 49 /r 

CMOVNSr76,r/m76 

RM 

Valid 

Valid 

Move if not sign (SF=0). 

OF 49 /r 

CMOVNS r32, r/m32 

RM 

Valid 

Valid 

Move if not sign (SF=0). 

REX.W + OF 49 /r 

CMOVNS r64, r/m64 

RM 

Valid 

N.E. 

Move if not sign (SF=0). 

OF 45 /r 

CMOVNZr76,r/m76 

RM 

Valid 

Valid 

Move if not zero (ZF=0). 

OF 45 /r 

CMOVNZ r32, r/m32 

RM 

Valid 

Valid 

Move if not zero (ZF=0). 

REX.W + OF 45 /r 

CMOVNZ r64, r/m64 

RM 

Valid 

N.E. 

Move if not zero (ZF=0). 

OF 40 /r 

CMOVOr76,r/m76 

RM 

Valid 

Valid 

Move if overflow (OF=1). 

OF 40 /r 

CMOVO r32, r/m32 

RM 

Valid 

Valid 

Move If overflow (OF=1). 

REX.W + OF 40 /r 

CMOVO r64, r/m64 

RM 

Valid 

N.E. 

Move If overflow (OF=1). 

OF 4A /r 

CMOVPr76,r/m76 

RM 

Valid 

Valid 

Move If parity (PF=1). 

OF 4A /r 

CMOVP r32, r/m32 

RM 

Valid 

Valid 

Move If parity (PF=1). 

REX.W + OF 4A /r 

CMOVP r64, r/m64 

RM 

Valid 

N.E. 

Move If parity (PF=1). 

OF 4A /r 

CMOVPEr76,r/m76 

RM 

Valid 

Valid 

Move If parity even (PF=1). 

OF 4A /r 

CMOVPE r32, r/m32 

RM 

Valid 

Valid 

Move If parity even (PF=1). 

REX.W + OF 4A /r 

CMOVPE r64, r/m64 

RM 

Valid 

N.E. 

Move If parity even (PF=1). 
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Opcode 

Instruction 

Op/ 

En 

64-Bit 

Mode 

Compat/ 
Leg Mode 

Description 

OF 4B /r 

CMOVPO rl6, r/m 76 

RM 

Valid 

Valid 

Move if parity odd (PF=0). 

OF 4B /r 

CMOVPO r32, r/m32 

RM 

Valid 

Valid 

Move if parity odd (PF=0). 

REX.W + OF 4B /r 

CMOVPO r64, r/m64 

RM 

Valid 

N.E. 

Move if parity odd (PF=0). 

OF 48 /r 

CMOVS r16, r/m 16 

RM 

Valid 

Valid 

Move if sign (SF=1). 

OF 48 /r 

CMOVS r32, r/m32 

RM 

Valid 

Valid 

Move if sign (SF=1). 

REX.W + OF 48 /r 

CMOVS r64, r/m64 

RM 

Valid 

N.E. 

Move if sign (SF=1). 

OF 44 /r 

CMOVZ r16, r/m 16 

RM 

Valid 

Valid 

Move if zero (ZF=1). 

OF 44 /r 

CMOVZ r32, r/m32 

RM 

Valid 

Valid 

Move if zero (ZF=1). 

REX.W + OF 44 /r 

CMOVZ r64, r/m64 

RM 

Valid 

N.E. 

Move if zero (ZF=1). 


Instruction Operand 

Encoding 

Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RM 

ModRMxeg (r, w) 

ModRM:r/m (r) 

NA 

NA 


Description 

The CMOVcc instructions check the state of one or more of the status flags in the EFLAGS register (CF, OF, PF, SF, 
and ZF) and perform a move operation if the flags are in a specified state (or condition). A condition code (cc) is 
associated with each instruction to indicate the condition being tested for. If the condition is not satisfied, a move 
is not performed and execution continues with the instruction following the CMOVcc instruction. 

These instructions can move 16-bit, 32-bit or 64-bit values from memory to a general-purpose register or from one 
general-purpose register to another. Conditional moves of 8-bit register operands are not supported. 

The condition for each CMOVcc mnemonic is given in the description column of the above table. The terms "less" 
and "greater" are used for comparisons of signed integers and the terms "above" and "below" are used for 
unsigned integers. 

Because a particular state of the status flags can sometimes be interpreted in two ways, two mnemonics are 
defined for some opcodes. For example, the CMOVA (conditional move if above) instruction and the CMOVNBE 
(conditional move if not below or equal) instruction are alternate mnemonics for the opcode OF 47FI. 

The CMOVcc instructions were introduced in P6 family processors; however, these instructions may not be 
supported by all IA-32 processors. Software can determine if the CMOVcc instructions are supported by checking 
the processor's feature information with the CPUID instruction (see "CPUID—CPU Identification" in this chapter). 

In 64-bit mode, the instruction's default operation size is 32 bits. Use of the REX.R prefix permits access to addi¬ 
tional registers (R8-R15). Use of the REX.W prefix promotes operation to 64 bits. See the summary chart at the 
beginning of this section for encoding data and limits. 

Operation 

temp <- SRC 

IF condition TRUE 
THEN 

BEST temp; 

FI; 

ELSE 

IF (OperandSize = 32 and IA-32e mode active) 

THEN 

DEST[63:32] ^ 0; 

FI; 

FI; 
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Flags Affected 

None. 

Protected Mode Exceptions 

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

If the DS, ES, FS, or GS register contains a NULL segment selector. 

#SS(0) If a memory operand effective address is outside the SS segment limit. 

#PF(fault-code) If a page fault occurs. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the 

current privilege level is 3. 

#UD If the LOCK prefix is used. 

Real-Address Mode Exceptions 

#GP If a memory operand effective 

#SS If a memory operand effective 

#UD If the LOCK prefix is used. 

Virtual-SOSe Mode Exceptions 

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

#SS(0) If a memory operand effective address is outside the SS segment limit. 

#PF(fault-code) If a page fault occurs. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made. 

#UD If the LOCK prefix is used. 

Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

e4-Bit Mode Exceptions 

#SS(0) If a memory address referencing the SS segment is in a non-canonical form. 

#GP(0) If the memory address is in a non-canonical form. 

#PF(fault-code) If a page fault occurs. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the 

current privilege level is 3. 

#UD If the LOCK prefix is used. 


address is outside the CS, DS, ES, FS, or GS segment limit, 
address is outside the SS segment limit. 
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CMP—Compare Two Operands 


Opcode 

Instruction 

Op/ 

En 

64-Bit 

Mode 

Compat/ 
Leg Mode 

Description 

3C/b 

CMP AL, /mmS 

1 

Valid 

Valid 

Compare /mmS with AL. 

3D iw 

CMP AX,/mm 7 6 

1 

Valid 

Valid 

Compare /mm 7 6 with AX. 

3D id 

CMP EAX, imm32 

1 

Valid 

Valid 

Compare imm32 with EAX. 

REX.W + 3D id 

CMP RAX, imm32 

1 

Valid 

N.E. 

Compare imm32 sign-extended to 64-bits 
with RAX. 

80 /7 ib 

CMP r/mS, immS 

Ml 

Valid 

Valid 

Compare /mmSwith r/mS. 

REX + 80 /7 ib 

CMP r/mS, immS 

Ml 

Valid 

N.E. 

Compare /mmSwith r/mS. 

81 /7 iw 

CMP r/m 7 6, /mm 7 6 

Ml 

Valid 

Valid 

Compare imm 76 with r/m 7 6. 

81 /7 id 

CMP r/m32, imm32 

Ml 

Valid 

Valid 

Compare /mmSZwith r/m32. 

REX.W + 81/7 id 

CMP r/m64, imm32 

Ml 

Valid 

N.E. 

Compare imm32 sign-extended to 64-bits 
with r/m64. 

83 /7 ib 

CMP r/m 7 6, imm8 

Ml 

Valid 

Valid 

Compare /mmSwith r/m76. 

83 /7 ib 

CMP r/m32, imm8 

Ml 

Valid 

Valid 

Compare /mmSwith r/m32. 

REX.W + 83 /7 ib 

CMP r/m64, imm8 

Ml 

Valid 

N.E. 

Compare imm8 with r/m64. 

38 Ir 

CMP r/mS, r8 

MR 

Valid 

Valid 

Compare rSwith r/m8. 

REX + 38 Ir 

CMP r/mS, r8 

MR 

Valid 

N.E. 

Compare r8 with r/mS. 

39/r 

CMP r/m 76, r7 6 

MR 

Valid 

Valid 

Compare r76with r/m 76. 

39/r 

CMP r/m32, r32 

MR 

Valid 

Valid 

Compare rSSwith r/m32. 

REX.W + 39 Ir 

CMP r/m64,r64 

MR 

Valid 

N.E. 

Compare r64 with r/m64. 

3A Ir 

CMP rS, r/m8 

RM 

Valid 

Valid 

Compare r/mS with rS. 

REX + 3A Ir 

CMP r8\ r/m8 

RM 

Valid 

N.E. 

Compare r/mS with r8. 

38/r 

CMP r76, r/m 7 6 

RM 

Valid 

Valid 

Compare r/m 76 with rl6. 

38/r 

CMP r32, r/m32 

RM 

Valid 

Valid 

Compare r/mSSwith r32. 

REX.W + 38 Ir 

CMP r64, r/m64 

RM 

Valid 

N.E. 

Compare r/m64 with r64. 


NOTES: 

* In 64-blt mode, r/m8 can not be encoded to access the following byte registers If a REX prefix is used: AH, BH, CH, DH. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RM 

ModRM:reg (r) 

ModRM;r/m (r) 

NA 

NA 

MR 

ModRM:r/m (r) 

ModRM:reg (r) 

NA 

NA 

Ml 

ModRM:r/m (r) 

imm8 

NA 

NA 

1 

AL/AX/EAX/RAX (r) 

imm8 

NA 

NA 


Description 

Compares the first source operand with the second source operand and sets the status flags in the EFLAGS register 
according to the results. The comparison is performed by subtracting the second operand from the first operand 
and then setting the status flags in the same manner as the SUB instruction. When an immediate value is used as 
an operand, it is sign-extended to the length of the first operand. 

The condition codes used by the Jcc, CMOVcc, and SETcc instructions are based on the results of a CMP instruction. 
Appendix B, "EFLAGS Condition Codes," in the Intel® 64 and IA-32 Architectures Software Developer's Manual, 
Volume 1, shows the relationship of the status flags and the condition codes. 
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In 64-bit mode, the instruction's default operation size is 32 bits. Use of the REX.R prefix permits access to addi¬ 
tional registers (R8-R15). Use of the REX.W prefix promotes operation to 64 bits. See the summary chart at the 
beginning of this section for encoding data and limits. 

Operation 

temp ^ SRC1 - SignExtend(SRC2); 

ModifyStatusFlags; (* Modify status flags In the same manner as the SUB instruction*) 

Flags Affected 

The CF, OF, SF, ZF, AF, and PF flags are set according to the result. 

Protected Mode Exceptions 

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

If the DS, ES, FS, or GS register contains a NULL segment selector. 

#SS(0) If a memory operand effective address is outside the SS segment limit. 

#PF(fault-code) If a page fault occurs. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the 

current privilege level is 3. 

#UD If the LOCK prefix is used. 

Real-Address Mode Exceptions 

#GP If a memory operand effective 

#SS If a memory operand effective 

Virtual-SOSe Mode Exceptions 

#GP(0) If a memory operand effective 

#SS(0) If a memory operand effective 

#PF(fault-code) If a page fault occurs. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made. 

#UD If the LOCK prefix is used. 

Compatibility Mode Exceptions 

Same exceptions as in protected mode. 


address is outside the CS, DS, ES, FS, or GS segment limit, 
address is outside the SS segment limit. 

address is outside the CS, DS, ES, FS, or GS segment limit, 
address is outside the SS segment limit. 


e4-Bit Mode Exceptions 


#SS(0) 

#GP(0) 

#PF(fault-code) 

#AC(0) 

#UD 


If a memory address referencing the SS segment is in a non-canonical form. 

If the memory address is in a non-canonical form. 

If a page fault occurs. 

If alignment checking is enabled and an unaligned memory reference is made while the 
current privilege level is 3. 

If the LOCK prefix is used. 
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CMPPD—Compare Packed Double-Precision Floating-Point Values 


Opcode/ 

Instruction 

Op/ 

En 

64/32 
bit Mode 
Support 

CPUID 

Feature 

Flag 

Description 

66 OF C2 /r lb 

CMPPD xmnnl, xmm2/m128, innm8 

RMI 

V/V 

SSE2 

Compare packed double-precision floating-point values 
in xmm2/m128 and xmmi using bits 2:0 of imm8 as a 
comparison predicate. 

VEX.NDS.128.66.0F.WIG C2 /r ib 

VCMPPD xmmi, xmm2, xmm3/m128, 
imm8 

RVMI 

v/v 

AVX 

Compare packed double-precision floating-point values 
in xmm3/m128 and xmm2 using bits 4:0 of imm8 as a 
comparison predicate. 

VEX.NDS.256.66.0F.WIG C2 /r ib 

VCMPPD ymmi, ymm2, ymm3/nn256, 
imm8 

RVMI 

V/V 

AVX 

Compare packed double-precision floating-point values 
in ymm3/m256 and ymm2 using bits 4:0 of imm8 as a 
comparison predicate. 

EVEX.NDS.128.66.0F.W1 C2/r ib 
VCMPPD k1 {k2},xmm2, 
xmm3/m128/m64bcst, imm8 

FV 

v/v 

AVX512VL 

AVX512F 

Compare packed double-precision floating-point values 
in xmm3/m128/m64bcst and xmm2 using bits 4:0 of 
immS as a comparison predicate with writemask k2 
and leave the result in mask register k1. 

EVEX.NDS.256.66.0F.W1 C2 /r ib 
VCMPPD k1 {k2}, ymm2, 
ymm3/m256/m64bcst, imm8 

FV 

v/v 

AVX512VL 

AVX512F 

Compare packed double-precision floating-point values 
in ymm3/m256/m64bcst and ymm2 using bits 4:0 of 
immS as a comparison predicate with writemask k2 
and leave the result in mask register k1. 

EVEX.NDS.512.66.0F.W1 C2/r ib 
VCMPPD k1 {k2}, zmm2, 
zmm3/m512/m64bcst[sae}, immS 

FV 

v/v 

AVX512F 

Compare packed double-precision floating-point values 
in zmm3/m512/m64bcst and zmm2 using bits 4:0 of 
immS as a comparison predicate with writemask k2 
and leave the result in mask register k1. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RMI 

ModRM:reg (r, w) 

ModRM:r/m (r) 

ImmS 

NA 

RVMI 

ModRM:reg (w) 

VEX.vvvv 

ModRM:r/m (r) 

ImmS 

FV 

ModRM:reg (w) 

EVEX.vvvv 

ModRM:r/m (r) 

ImmS 


Description 

Performs a SIMD compare of the packed double-precision floating-point values in the second source operand and 
the first source operand and returns the results of the comparison to the destination operand. The comparison 
predicate operand (immediate byte) specifies the type of comparison performed on each pair of packed values in 
the two source operands. 

EVEX encoded versions: The first source operand (second operand) is a ZMM/YMM/XMM register. The second 
source operand can be a ZMM/YMM/XMM register, a 512/256/128-bit memory location or a 512/256/128-bit vector 
broadcasted from a 64-bit memory location. The destination operand (first operand) is an opmask register. 
Comparison results are written to the destination operand under the writemask k2. Each comparison result is a 
single mask bit of 1 (comparison true) or 0 (comparison false). 

VEX.256 encoded version: The first source operand (second operand) is a YMM register. The second source 
operand (third operand) can be a YMM register or a 256-bit memory location. The destination operand (first 
operand) is a YMM register. Four comparisons are performed with results written to the destination operand. The 
result of each comparison is a quadword mask of all Is (comparison true) or all Os (comparison false). 

128-bit Legacy SSE version: The first source and destination operand (first operand) is an XMM register. The 
second source operand (second operand) can be an XMM register or 128-bit memory location. Bits (MAX_VL- 
1:128) of the corresponding ZMM destination register remain unchanged. Two comparisons are performed with 
results written to bits 127:0 of the destination operand. The result of each comparison is a quadword mask of all 
Is (comparison true) or all Os (comparison false). 
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VEX.128 encoded version: The first source operand (second operand) is an XMM register. The second source 
operand (third operand) can be an XMM register or a 128-bit memory location. Bits (MAX_VL-1:128) of the desti¬ 
nation ZMM register are zeroed. Two comparisons are performed with results written to bits 127:0 of the destina¬ 
tion operand. 

The comparison predicate operand is an 8-bit immediate: 

• For instructions encoded using the VEX or EVEX prefix, bits 4:0 define the type of comparison to be performed 
(see Table 3-1). Bits 5 through 7 of the immediate are reserved. 

• For instruction encodings that do not use VEX prefix, bits 2:0 define the type of comparison to be made (see the 
first 8 rows of Table 3-1). Bits 3 through 7 of the immediate are reserved. 


Table 3-1. Comparison Predicate for CMPPD and CMPPS Instructions 


Predicate 

immS 

Value 

Description 

Result: A Is 1 st Operand, B Is 2nd Operand 

Signals 
#IA on 
QNAN 

A>B 

A<B 

A = B 

Unordered^ 

EQ_OQ (EQ) 

OH 

Equal (ordered, non-signaling) 

False 

False 

True 

False 

No 

LT_0S (LT) 

1H 

Less-than (ordered, signaling) 

False 

True 

False 

False 

Yes 

LE_OS (LE) 

2H 

Less-than-or-equal (ordered, signaling) 

False 

True 

True 

False 

Yes 

UN0RD_Q (UNORD) 

3H 

Unordered (non-signaling) 

False 

False 

False 

True 

No 

NEQ_UQ (NEQ) 

4H 

Not-equal (unordered, non-signaling) 

True 

True 

False 

True 

No 

NLT_US (NET) 

5H 

Not-less-than (unordered, signaling) 

True 

False 

True 

True 

Yes 

NLE_US (NLE) 

6H 

Not-less-than-or-equal (unordered, signaling) 

True 

False 

False 

True 

Yes 

0RD_Q (ORD) 

7H 

Ordered (non-signaling) 

True 

True 

True 

False 

No 

EQ_UQ 

8H 

Equal (unordered, non-signaling) 

False 

False 

True 

True 

No 

NCE_US(NGE) 

9H 

Not-greater-than-or-equal (unordered, 
signaling) 

False 

True 

False 

True 

Yes 

NGT_US(NGT) 

AH 

Not-greater-than (unordered, signaling) 

False 

True 

True 

True 

Yes 

FALSE_OQ(FALSE) 

BH 

False (ordered, non-signaling) 

False 

False 

False 

False 

No 

NEQ_0Q 

CH 

Not-equal (ordered, non-signaling) 

True 

True 

False 

False 

No 

GE_OS (GE) 

DH 

Greater-than-or-equal (ordered, signaling) 

True 

False 

True 

False 

Yes 

GT_OS (GT) 

EH 

Greater-than (ordered, signaling) 

True 

False 

False 

False 

Yes 

TRUE_UQ(TRUE) 

FH 

True (unordered, non-signaling) 

True 

True 

True 

True 

No 

ECLOS 

10H 

Equal (ordered, signaling) 

False 

False 

True 

False 

Yes 

LT_0Q 

11H 

Less-than (ordered, nonsignaling) 

False 

True 

False 

False 

No 

LE_OQ 

12H 

Less-than-or-equal (ordered, nonsignaling) 

False 

True 

True 

False 

No 

UN0RD_S 

13H 

Unordered (signaling) 

False 

False 

False 

True 

Yes 

NEQ_US 

14H 

Not-equal (unordered, signaling) 

True 

True 

False 

True 

Yes 

NLT_UQ 

15H 

Not-less-than (unordered, nonsignaling) 

True 

False 

True 

True 

No 

NLE_UQ 

16H 

Not-less-than-or-equal (unordered, nonsig¬ 
naling) 

True 

False 

False 

True 

No 

0RD_S 

17H 

Ordered (signaling) 

True 

True 

True 

False 

Yes 

ECLUS 

18H 

Equal (unordered, signaling) 

False 

False 

True 

True 

Yes 

NGE_UQ 

19H 

Not-greater-than-or-equal (unordered, non¬ 
signaling) 

False 

True 

False 

True 

No 
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Table 3-1. Comparison Predicate for CMPPD and CMPPS Instructions (Contd.) 


Predicate 

immS 

Value 

Description 

Result: A Is 1 st Operand, B Is 2nd Operand 

Signals 

#IAon 

QNAN 

A>B 

A< B 

A = B 

Unordered^ 

NGT_UQ 

1AH 

Not-greater-than (unordered, nonsignaling) 

False 

True 

True 

True 

No 

FALSE_OS 

1BH 

False (ordered, signaling) 

False 

False 

False 

False 

Yes 

NEQ_0S 

1CH 

Not-equal (ordered, signaling) 

True 

True 

False 

False 

Yes 

GE_OQ 

1DH 

Greater-than-or-equal (ordered, nonsignal¬ 
ing) 

True 

False 

True 

False 

No 

GT_OQ 

1EH 

Greater-than (ordered, nonsignaling) 

True 

False 

False 

False 

No 

TRUE_US 

1FH 

True (unordered, signaling) 

True 

True 

True 

True 

Yes 


NOTES: 

1. If either operand A or B is a NAN. 


The unordered relationship is true when at least one of the two source operands being compared is a NaN; the 
ordered relationship is true when neither source operand is a NaN. 

A subsequent computational instruction that uses the mask result in the destination operand as an input operand 
will not generate an exception, because a mask of all Os corresponds to a floating-point value of -i-O.O and a mask 
of all Is corresponds to a QNaN. 

Note that processors with "CPUID.lHiECX.AVX =0" do not implement the "greater-than", "greater-than-or-equal", 
"not-greater than", and "not-greater-than-or-equal relations" predicates. These comparisons can be made either 
by using the inverse relationship (that is, use the "not-less-than-or-equal" to make a "greater-than" comparison) 
or by using software emulation. When using software emulation, the program must swap the operands (copying 
registers when necessary to protect the data that will now be in the destination), and then perform the compare 
using a different predicate. The predicate to be used for these emulations is listed in the first 8 rows of Table 3-7 
(Intel 64 and IA-32 Architectures Software Developer's Manual Volume 2A) under the heading Emulation. 

Compilers and assemblers may implement the following two-operand pseudo-ops in addition to the three-operand 
CMPPD instruction, for processors with "CPUID.lHiECX.AVX =0". See Table 3-2. Compiler should treat reserved 
Imm8 values as illegal syntax. 


Table 3-2. Pseudo-Op and CMPPD Implementation 


Pseudo-Op 

CMPPD Implementation 

CMPEQPD xmmi, xmmZ 

CMPPD xmml, xmmZ, 0 

CMPLTPD xmml, xmmZ 

CMPPD xmml, xmmZ, 1 

CMPLEPD xmml, xmmZ 

CMPPD xmml, xmmZ, Z 

CMPUNORDPD xmml, xmmZ 

CMPPD xmml, xmmZ, 3 

CMPNEQPD xmml, xmmZ 

CMPPD xmml, xmmZ, 4 

CMPNLTPD xmml, xmmZ 

CMPPD xmml, xmmZ, 5 

CMPNLEPD xmml, xmmZ 

CMPPD xmml, xmmZ, 6 

CMPORDPD xmml, xmmZ 

CMPPD xmml, xmmZ, 7 


The greater-than relations that the processor does not implement require more than one instruction to emulate in 
software and therefore should not be implemented as pseudo-ops. (For these, the programmer should reverse the 
operands of the corresponding less than relations and use move instructions to ensure that the mask is moved to 
the correct destination register and that the source operand is left intact.) 

Processors with "CPUID.lHiECX.AVX =1" implement the full complement of 32 predicates shown in Table 3-3, soft¬ 
ware emulation is no longer needed. Compilers and assemblers may implement the following three-operand 
pseudo-ops in addition to the four-operand VCMPPD instruction. See Table 3-3, where the notations of regl reg2, 
and reg3 represent either XMM registers or VMM registers. Compiler should treat reserved Imm8 values as illegal 
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syntax. Alternately, intrinsics can map the pseudo-ops to pre-defined constants to support a simpler intrinsic inter¬ 
face. Compilers and assemblers may implement three-operand pseudo-ops for EVEX encoded VCMPPD instructions 
in a similar fashion by extending the syntax listed in Table 3-3. 


Table 3-3. Pseudo-Op and VCMPPD Implementation 


Pseudo-Op 

CMPPD Implementation 

VCMPEQPD regl, regZ, reg3 

VCMPPD regl, regZ, reg3, 0 

VCMPLTPD regl, regZ, reg3 

VCMPPD regl, regZ, reg3, 1 

VCMPLEPD reg 1, regZ, reg3 

VCMPPD regl, regZ, reg3, Z 

VCMPUNORDPD regl, regZ, reg3 

VCMPPD regl, regZ, reg3, 3 

VCMPNEQPD regl, regZ, reg3 

VCMPPD regl, regZ, reg3, 4 

VCMPNLTPD regl, regZ, reg3 

VCMPPD regl, regZ, reg3, 5 

VCMPNLEPD regl, regZ, reg3 

VCMPPD regl, regZ, reg3, 6 

VCMPORDPD regl, regZ, reg3 

VCMPPD regl, regZ, reg3, 7 

VCMPECLUQPD regl, regZ, reg3 

VCMPPD regl, regZ, reg3, 8 

VCMPNGEPD regl, regZ, reg3 

VCMPPD regl, regZ, reg3, 9 

VCMPNGTPD regl, regZ, reg3 

VCMPPD regl, regZ, reg3, OAH 

VCMPFALSEPD regl, regZ, reg3 

VCMPPD regl, regZ, reg3, OBH 

VCMPNEQ_OQPD regl, regZ, reg3 

VCMPPD regl, regZ, reg3, OCH 

VCMPGEPD regl, regZ, reg3 

VCMPPD regl, regZ, reg3, ODH 

VCMPGTPD regl, regZ, reg3 

VCMPPD regl, regZ, reg3, OEH 

VCMPTRUEPD regl, regZ, reg3 

VCMPPD regl, regZ, reg3, OFH 

VCMPEQ_OSPD regl, regZ, reg3 

VCMPPD regl, regZ, reg3, lOH 

VCMPLT_OQPD regl, regZ, reg3 

VCMPPD regl, regZ, reg3, 1IH 

VCMPLE_OQPD regl, regZ, reg3 

VCMPPD regl, regZ, reg3, IZH 

VCMPUNORD_SPD regl, regZ, reg3 

VCMPPD regl, regZ, reg3, 13H 

VCMPNEQ_USPD regl, regZ, reg3 

VCMPPD regl, regZ, reg3, 14H 

VCMPNLT_UQPD regl, regZ, reg3 

VCMPPD regl, regZ, reg3, 15H 

VCMPNLE_UQPD regl, regZ, reg3 

VCMPPD regl, regZ, reg3, 16H 

VCMPORD SPD regl, regZ, reg3 

VCMPPD regl, regZ, reg3, 17H 

VCMPEQ_USPD regl, regZ, reg3 

VCMPPD regl, regZ, reg3, 18H 

VCMPNGE_UQPD regl, regZ, reg3 

VCMPPD regl, regZ, reg3, 19H 

VCMPNGT_UQPD regl, regZ, reg3 

VCMPPD regl, regZ, reg3, lAH 

VCMPFALSE_OSPD regl, regZ, reg3 

VCMPPD regl, regZ, reg3, 18H 

VCMPNEQ_OSPD regl, regZ, reg3 

VCMPPD regl, regZ, reg3, ICH 

VCMPGE_OQPD regl, regZ, reg3 

VCMPPD regl, regZ, reg3, IDH 

VCMPGT_OQPD regl, regZ, reg3 

VCMPPD regl, regZ, reg3, lEH 

VCMPTRUE_USPD regl, regZ, reg3 

VCMPPD regl, regZ, reg3, IFH 
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Operation 

CASE (COMPARISON PREDICATE) OF 
0: OPS ^ ECLOQ; OPS ^ ECLOQ; 

1: OPS ^ LT_OS; OPS ^ LT_OS; 

2: OPS ^ LE_OS; OPS ^ LE_OS; 

S: OPS ^ UNORD_Q; OPS ^ UNORD_Q; 
4: OPS ^ NECLUQ; OPS ^ NECLUQ; 

S: OPS ^ NLT_US; OPS ^ NLT_US; 

6: OPS ^ NLE_US; OPS ^ NLE_US; 

7: OPS ^ ORD_Q; OPS ^ ORD_Q; 

8: OPS ^ EQ_UQ; 

9: OPS ^ NGE_US; 


10: 

OPS 


NGT_US; 

11: 

OPS 


FALSE_OQ; 

12: 

OPS 


NECLOQ; 

UJ 

OPS 


GE_OS; 

14: 

OPS 


GT_OS; 

IS: 

OPS 


TRUE_UQ; 

cn 

OPS 


EQ_OS; 

17: 

OPS 


LT_OQ; 

Op 

OPS 


LE_OQ; 

19: 

OPS 


UNORD_S; 

no 

p 

OPS 


NECLUS; 

21: 

OPS 


NLT_UQ; 

22: 

OPS 


NLE_UQ; 

no 

UJ 

OPS 


ORD_S; 

24: 

OPS 


EQ_US; 

2S: 

OPS 


NGE_UQ; 

26: 

OPS 


NGT_UQ; 

27: 

OPS 


FALSE_OS; 

no 

P? 

OPS 


NECLOS; 

29: 

OPS 


GE_OQ; 

SO: 

OPS 


GT_OQ; 

SI: 

OPS 


TRUE_US; 


DEFAULT: Reserved; 
ESAC; 
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VCMPPD (EVEX encoded versions) 

(KL, VL) = (2,128), (4, 256), (8, 512) 

FOR] ^0 TO KL-1 
i ^ j * 64 

IF k2[j] OR *no writemask* 

THEN 

IF (EVEX.b = 1) AND (SRC2 *ls memory*) 

THEN 

CMP ^ SRC1 [1+63:1] OP5 SRC2[63:0] 

ELSE 

CMP ^ SRC1 [1+63:1] OPS SRC2[i+63:l] 

FI; 

IF CMP = TRUE 

THEN DEST[j]^ 1; 

ELSE DEST[j] ^ 0; FI; 

ELSE DEST[j] <- 0 ; zeroIng-maskIng only 

FI; 

ENDFOR 

DEST[MAX_KL-1 :KL] ^ 0 


VCMPPD (VEX.256 encoded version) 

CMPO ^ SRC1 [63:0] OPS SRC2[63:0]; 

CMP1 ^ SRC1 [127:64] OPS SRC2[127:64]; 

CMP2 ^ SRC1 [191:128] OPS SRC2[191:128]; 

CMP3 ^ SRC1 [255:192] OPS SRC2[255:192]; 

IF CMPO = TRUE 

THEN DEST[63:0] ^ FFFFFFFFFFFFFFFFH; 

ELSE DEST[63:0] ^ OOOOOOOOOOOOOOOOH; FI; 
IFCMP1 =TRUE 

THEN DEST[127:64] ^ FFFFFFFFFFFFFFFFH; 

ELSE DEST[127:64] ^ OOOOOOOOOOOOOOOOH; FI; 
IFCMP2 = TRUE 

THEN DEST[191:128] ^ FFFFFFFFFFFFFFFFH; 
ELSE DEST[191:128] ^ OOOOOOOOOOOOOOOOH; FI; 
IFCMP3 = TRUE 

THEN DEST[255:192] ^ FFFFFFFFFFFFFFFFH; 
ELSE DEST[255:192] ^ OOOOOOOOOOOOOOOOH; FI; 
DEST[MAX_VL-1:256]^0 


VCMPPD (VEX.128 encoded version) 

CMPO ^ SRC1 [63:0] OPS SRC2[63:0]; 

CMP1 ^ SRC1 [127:64] OPS SRC2[127:64]; 

IF CMPO = TRUE 

THEN DEST[63:0] ^ FFFFFFFFFFFFFFFFH; 

ELSE DEST[63:0] ^ OOOOOOOOOOOOOOOOH; FI; 
IFCMP1 =TRUE 

THEN DEST[127:64] ^ FFFFFFFFFFFFFFFFH; 
ELSE DEST[127:64] ^ OOOOOOOOOOOOOOOOH; FI; 
DEST[MAX_VL-1:128]^0 
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CMPPD (1 Z8-bit Legacy SSE version) 

CMPO ^ SRC1 [63:0] 0P3 SRC2[63:0]; 

CMP1 ^ SRC1 [127:64] 0P3 SRC2[127:64]; 

IF CMPO = TRUE 

THEN DEST[63:0] ^ FFFFFFFFFFFFFFFFH; 

ELSE DEST[63:0] ^ OOOOOOOOOOOOOOOOH; FI; 

IFCMP1 =TRUE 

THEN DEST[127:64] ^ FFFFFFFFFFFFFFFFH; 

ELSE DEST[127:64] ^ OOOOOOOOOOOOOOOOH; FI; 

DEST[MAX_VL-1:128] (Unmodified) 

Intel C/C++ Compiler Intrinsic Equivalent 

VCMPPD_mmask8_mm512_cmp_pd_mask(_mSI 2d a,_mSI 2d b, int imm); 

VCMPPD_mmask8 _mm512_cmp_round_pd_mask(_mSIZd a,_mSI 2d b, int imm, int sae); 

VCMPPD_mmask8 _mm512_mask_cmp_pd_mask(_mmask8 k1,_mSI 2d a,_mSI 2d b, int imm); 

VCMPPD_mmask8_mm512_mask_cmp_round_pd_mask(_mmask8 k1,_mSI 2d a,_mSI 2d b, int imm, int sae); 

VCMPPD_mmask8 _mm256_cmp_pd_mask(_m256d a,_m256d b, int imm); 

VCMPPD_mmask8 _mm256_mask_cmp_pd_mask(_mmask8 k1,_m256d a,_m256d b, int imm); 

VCMPPD_mmask8 _mm_cmp_pd_mask(_ml 28d a,_ml 28d b, int imm); 

VCMPPD_mmask8 _mm_mask_cmp_pd_mask(_mmask8 k1,_ml 28d a,_ml 28d b, int imm); 

VCMPPD_m256 _mm256_cmp_pd(_m256d a,_m256d b, int imm) 

(V)CMPPD_ml 28_mm_cmp_pd(_ml 28d a,_ml 28d b, int imm) 

SIMD Floating-Point Exceptions 

Invalid if SNaN operand and invalid if QNaN and predicate as listed in Table 3-1. 

Denormal 

Other Exceptions 

VEX-encoded instructions, see Exceptions Type 2. 

EVEX-encoded instructions, see Exceptions Type E2. 
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CMPPS—Compare Packed Single-Precision Floating-Point Values 


Opcode/ 

Instruction 

Op/ 

En 

64/32 
bit Mode 
Support 

CPUID 

Feature 

Fiag 

Description 

OF C2 /r lb 

CMPPS xmmi, xmm2/m128, 
imm8 

RMI 

V/V 

SSE 

Compare packed single-precision floating-point values in 
xmm2/m128 and xmmi using bits 2:0 of imm8 as a 
comparison predicate. 

VEX.NDS.128.0F.WIG C2 /rib 
VCMPPS xmnnl, xmm2, 
xmm3/nn128, imm8 

RVMI 

v/v 

AVX 

Compare packed single-precision floating-point values in 
xmm3/m128 and xmm2 using bits 4:0 of imm8 as a 
comparison predicate. 

VEX.NDS.256.0F.WIG C2 /r lb 
VCMPPS ymmi, ymm2, 
ymm3/m256, imm8 

RVMI 

V/V 

AVX 

Compare packed single-precision floating-point values in 
ymm3/m256 and ymm2 using bits 4:0 of imm8 as a 
comparison predicate. 

EVEX.NDS.128.0F.W0 C2 /r lb 
VCMPPS k1 {k2], xmm2, 
xmm3/m128/m32bcst, imm8 

FV 

v/v 

AVX512VL 

AVX512F 

Compare packed single-precision floating-point values in 
xmm3/m128/m32bcst and xmm2 using bits 4:0 of imm8 as 
a comparison predicate with writemask k2 and leave the 
result in mask register k1. 

EVEX.NDS.256.0F.W0 C2 /r lb 
VCMPPS k1 {k2}, ymm2, 
ymm3/m256/m32bcst, imm8 

FV 

v/v 

AVX512VL 

AVX512F 

Compare packed single-precision floating-point values in 
ymm3/m256/m32bcst and ymm2 using bits 4:0 of imm8 as 
a comparison predicate with writemask k2 and leave the 
result in mask register k1. 

EVEX.NDS.51 2.0F.W0 C2 /r lb 
VCMPPS k1 {k2}, zmm2, 
zmm3/m512/m32bcst{sae}, imm8 

FV 

v/v 

AVX512F 

Compare packed single-precision floating-point values in 
zmm3/m512/m32bcst and zmm2 using bits 4:0 of imm8 as 
a comparison predicate with writemask k2 and leave the 
result in mask register k1. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RMI 

ModRM:reg (r, w) 

ModRM:r/m (r) 

Imm8 

NA 

RVMI 

ModRM:reg (w) 

VEX.vvvv 

ModRM:r/m (r) 

Imm8 

FV 

ModRM:reg (w) 

EVEX.vvvv 

ModRM:r/m (r) 

ImmS 


Description 

Performs a SIMD compare of the packed single-precision floating-point values in the second source operand and 
the first source operand and returns the results of the comparison to the destination operand. The comparison 
predicate operand (immediate byte) specifies the type of comparison performed on each of the pairs of packed 
values. 

EVEX encoded versions: The first source operand (second operand) is a ZMM/YMM/XMM register. The second 
source operand can be a ZMM/YMM/XMM register, a 512/256/128-bit memory location or a 512/256/128-bit vector 
broadcasted from a 32-bit memory location. The destination operand (first operand) is an opmask register. 
Comparison results are written to the destination operand under the writemask k2. Each comparison result is a 
single mask bit of 1 (comparison true) or 0 (comparison false). 

VEX.256 encoded version: The first source operand (second operand) is a YMM register. The second source operand 
(third operand) can be a YMM register or a 256-bit memory location. The destination operand (first operand) is a 
YMM register. Eight comparisons are performed with results written to the destination operand. The result of each 
comparison is a doubleword mask of all Is (comparison true) or all Os (comparison false). 

128-bit Legacy SSE version: The first source and destination operand (first operand) is an XMM register. The 
second source operand (second operand) can be an XMM register or 128-bit memory location. Bits (MAX_VL- 
1:128) of the corresponding ZMM destination register remain unchanged. Four comparisons are performed with 
results written to bits 127:0 of the destination operand. The result of each comparison is a doubleword mask of all 
Is (comparison true) or all Os (comparison false). 
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VEX.128 encoded version: The first source operand (second operand) is an XMM register. The second source 
operand (third operand) can be an XMM register or a 128-bit memory location. Bits (MAX_VL-1:128) of the desti¬ 
nation ZMM register are zeroed. Four comparisons are performed with results written to bits 127:0 of the destina¬ 
tion operand. 

The comparison predicate operand is an 8-bit immediate: 

• For instructions encoded using the VEX prefix and EVEX prefix, bits 4:0 define the type of comparison to be 
performed (see Table 3-1). Bits 5 through 7 of the immediate are reserved. 

• For instruction encodings that do not use VEX prefix, bits 2:0 define the type of comparison to be made (see 
the first 8 rows of Table 3-1). Bits 3 through 7 of the immediate are reserved. 

The unordered relationship is true when at least one of the two source operands being compared is a NaN; the 
ordered relationship is true when neither source operand is a NaN. 

A subsequent computational instruction that uses the mask result in the destination operand as an input operand 
will not generate an exception, because a mask of all Os corresponds to a floating-point value of -i-O.O and a mask 
of all Is corresponds to a QNaN. 

Note that processors with "CPUID.1FI:ECX.AVX =0" do not implement the "greater-than", "greater-than-or-equal", 
"not-greater than", and "not-greater-than-or-equal relations" predicates. These comparisons can be made either 
by using the inverse relationship (that is, use the "not-less-than-or-equal" to make a "greater-than" comparison) 
or by using software emulation. When using software emulation, the program must swap the operands (copying 
registers when necessary to protect the data that will now be in the destination), and then perform the compare 
using a different predicate. The predicate to be used for these emulations is listed in the first 8 rows of Table 3-7 
(Intel 64 and IA-32 Architectures Software Developer's Manual Volume 2A) under the heading Emulation. 

Compilers and assemblers may implement the following two-operand pseudo-ops in addition to the three-operand 
CMPPS instruction, for processors with "CPUID.1FI:ECX.AVX =0". See Table 3-4. Compiler should treat reserved 
Imm8 values as illegal syntax. 


Table 3-4. Pseudo-Op and CMPPS Implementation 


Pseudo-Op 

CMPPS Implementation 

CMPEQPS xmmi, xmmZ 

CMPPS xmm 1, xmmZ, 0 

CMPLTPS xmml, xmmZ 

CMPPS xmm 1, xmmZ, 1 

CMPLEPS xmml, xmmZ 

CMPPS xmm 1, xmmZ, Z 

CMPUNORDPS xmml, xmmZ 

CMPPS xmm 1, xmmZ, 3 

CMPNEQPS xmml, xmmZ 

CMPPS xmm 1, xmmZ, 4 

CMPNLTPS xmml, xmmZ 

CMPPS xmm 1, xmmZ, 5 

CMPNLEPS xmml, xmmZ 

CMPPS xmm 1, xmmZ, 6 

CMPORDPS xmml, xmmZ 

CMPPS xmm 1, xmmZ, 7 


The greater-than relations that the processor does not implement require more than one instruction to emulate in 
software and therefore should not be implemented as pseudo-ops. (For these, the programmer should reverse the 
operands of the corresponding less than relations and use move instructions to ensure that the mask is moved to 
the correct destination register and that the source operand is left intact.) 

Processors with "CPUID.1FI:ECX.AVX =1" implement the full complement of 32 predicates shown in Table 3-5, soft¬ 
ware emulation is no longer needed. Compilers and assemblers may implement the following three-operand 
pseudo-ops in addition to the four-operand VCMPPS instruction. See Table 3-5, where the notation of regl and 
reg2 represent either XMM registers or VMM registers. Compiler should treat reserved Imm8 values as illegal 
syntax. Alternately, intrinsics can map the pseudo-ops to pre-defined constants to support a simpler intrinsic inter¬ 
face. Compilers and assemblers may implement three-operand pseudo-ops for EVEX encoded VCMPPS instructions 
in a similar fashion by extending the syntax listed in Table 3-5. 
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Table 3-5. Pseudo-Op and VCMPPS Implementation 


Pseudo-Op 

CMPPS Implementation 

VCMPEQPS regl, regZ, reg3 

VCMPPS regl, regZ, reg3, 0 

VCMPLTPS reg7, regZ, reg3 

VCMPPS regl, regZ, reg3, 1 

VCMPLEPS regl, regZ, reg3 

VCMPPS regl, regZ, reg3, Z 

VCMPUNORDPS regl, regZ, reg3 

VCMPPS regl, regZ, reg3, 3 

VCMPNEQPS regh regZ, reg3 

VCMPPS regl, regZ, reg3, 4 

VCMPNLTPS regl, regZ, reg3 

VCMPPS regl, regZ, reg3, 5 

VCMPNLEPS regl, regZ, reg3 

VCMPPS regl, regZ, reg3, 6 

VCMPORDPS regl, regZ, reg3 

VCMPPS regl, regZ, reg3, 7 

VCMPECLUQPS regl, regZ, reg3 

VCMPPS regl, regZ, reg3, 8 

VCMPNGEPS regl, regZ, reg3 

VCMPPS regl, regZ, reg3, 9 

VCMPNGTPS regl, regZ, reg3 

VCMPPS regl, regZ, reg3, OAH 

VCMPFALSEPS regl, regZ, reg3 

VCMPPS regl, regZ, reg3, OBH 

VCMPNEQ_OQPS regl, regZ, reg3 

VCMPPS regl, regZ, reg3, OCH 

VCMPGEPS reg7, regZ, reg3 

VCMPPS regl, regZ, reg3, ODH 

VCMPGTPS regl, regZ, reg3 

VCMPPS regl, regZ, reg3, OEH 

VCMPTRUEPS regl, regZ, reg3 

VCMPPS regl, regZ, reg3, OFH 

VCMPECLOSPS regl, regZ, reg3 

VCMPPS regl, regZ, reg3, lOH 

VCMPLT_OQPS regl, regZ, reg3 

VCMPPS regl, regZ, reg3, 1IH 

VCMPLE_OQPS regl, regZ, reg3 

VCMPPS regl, regZ, reg3, IZH 

VCMPUNORD_SPS regl, regZ, reg3 

VCMPPS regl, regZ, reg3, 13H 

VCMPNECLUSPS regl, regZ, reg3 

VCMPPS regl, regZ, reg3, 14H 

VCMPNLT_UQPS regl, regZ, reg3 

VCMPPS regl, regZ, reg3, 15H 

VCMPNLE_UQPS regl, regZ, reg3 

VCMPPS regl, regZ, reg3, 16H 

VCMPORD SPS reg7, regZ, reg3 

VCMPPS regl, regZ, reg3, 17H 

VCMPECLUSPS regl, regZ, reg3 

VCMPPS regl, regZ, reg3, 18H 

VCMPNGE_UQPS regl, regZ, reg3 

VCMPPS regl, regZ, reg3, 19H 

VCMPNGT_UQPS regl, regZ, reg3 

VCMPPS regl, regZ, reg3, lAH 

VCMPFALSE_OSPS regl, regZ, reg3 

VCMPPS regl, regZ, reg3, IBH 

VCMPNEQ_OSPS regl, regZ, reg3 

VCMPPS regl, regZ, reg3, ICH 

VCMPGE_OQPS regl, regZ, reg3 

VCMPPS regl, regZ, reg3, IDH 

VCMPGT_OQPS regl, regZ, reg3 

VCMPPS regl, regZ, reg3, lEH 

VCMPTRUE_USPS regl, regZ, reg3 

VCMPPS regl, regZ, reg3, IFH 
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Operation 

CASE (COMPARISON PREDICATE) OF 
0: OP3 ^ EQ_OQ; OPS ^ ECLOQ; 

1: OP3 ^ LT_OS; OPS ^ LT_OS; 

2: OP3 ^ LE_OS; OPS ^ LE_OS; 

3: OP3 ^ UNORD_Q; OPS ^ UNORD_Q; 
4: OP3 ^ NECLUQ; OPS ^ NECLUQ; 

S: OP3 ^ NLT_US; OPS ^ NLT_US; 

6: OP3 ^ NLE_US; OPS ^ NLE_US; 

7: OP3 ^ ORD_Q; OPS ^ ORD_Q; 

8: OPS ^ EQ_UQ; 

9: OPS ^ NGE_US; 


10: 

OPS 


NGT_US; 

11: 

OPS 


FALSE_OQ; 

12: 

OPS 


NECLOQ; 

UJ 

OPS 


GE_OS; 

14: 

OPS 


GT_OS; 

IS: 

OPS 


TRUE_UQ; 

cn 

OPS 


EQ_OS; 

17: 

OPS 


LT_OQ; 

Op 

OPS 


LE_OQ; 

19: 

OPS 


UNORD_S; 

no 

p 

OPS 


NECLUS; 

21: 

OPS 


NLT_UQ; 

22: 

OPS 


NLE_UQ; 

no 

UJ 

OPS 


ORD_S; 

24: 

OPS 


EQ_US; 

2S: 

OPS 


NGE_UQ; 

26: 

OPS 


NGT_UQ; 

27: 

OPS 


FALSE_OS; 

no 

P? 

OPS 


NECLOS; 

29: 

OPS 


GE_OQ; 

30: 

OPS 


GT_OQ; 

31: 

OPS 


TRUE_US; 


DEFAULT: Reserved 
ESAC; 
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VCMPPS (EVEX encoded versions) 

(KL, VL) = (4,128), (8, 256), (16, 512) 

FOR] ^0 TO KL-1 
i^j*32 

IF k2[j] OR *no writemask* 

THEN 

IF (EVEX.b = 1) AND (SRC2 *ls memory*) 

THEN 

CMP ^ SRC1 [1+31 :l] OP5 SRC2[31:0] 

ELSE 

CMP ^ SRC1 [1+31 :l] OPS SRC2[i+31 :l] 

FI; 

IF CMP = TRUE 

THEN DEST[j]^ 1; 

ELSE DEST[j] ^ 0; FI; 

ELSE DEST[j] <- 0 ; zeroing-masking onlyFI; 

FI; 

ENDFOR 

DEST[MAX_KL-1 :KL] ^ 0 


VCMPPS (VEX.256 encoded version) 

CMPO ^ SRC1 [31:0] OPS SRC2[31:0]; 

CMP1 ^ SRC1 [63:32] OPS SRC2[63:32]; 
CMP2 ^ SRC1 [95:64] OPS SRC2[95:64]; 

CMP3 ^ SRC1 [127:96] OPS SRC2[127:96]; 
CMP4 ^ SRC1 [159:128] OPS SRC2[159:128]; 
CMP5 ^ SRC1 [191:160] OPS SRC2[191:160]; 
CMP6 ^ SRC1 [223:192] OPS SRC2[223:192]; 
CMP7 ^ SRC1 [255:224] OPS SRC2[255:224]; 
IF CMPO = TRUE 

THEN DEST[31:0] ^FFFFFFFFH; 

ELSE DEST[31:0] ^ OOOOOOOOOH; FI; 
IFCMP1 =TRUE 

THEN DEST[63:32] ^ FFFFFFFFH; 

ELSE DEST[63:32] ^OOOOOOOOOH; FI; 
IFCMP2 = TRUE 

THEN DEST[95:64] ^ FFFFFFFFH; 

ELSE DEST[95:64] ^ OOOOOOOOOH; FI; 
IFCMP3 = TRUE 

THEN DEST[127:96] ^ FFFFFFFFH; 

ELSE DEST[127:96] ^ OOOOOOOOOH; FI; 

IF CMP4 = TRUE 

THEN DEST[159:128] ^ FFFFFFFFH; 

ELSE DEST[159:128] ^ OOOOOOOOOH; FI; 
IFCMP5 = TRUE 

THEN DEST[191:160] ^ FFFFFFFFH; 

ELSE DEST[191:160] ^ OOOOOOOOOH; FI; 
IF CMP6 = TRUE 

THEN DEST[223:192] ^ FFFFFFFFH; 

ELSE DEST[223:192] ^OOOOOOOOOH; FI; 
IFCMP7 = TRUE 

THEN DEST[255:224] ^ FFFFFFFFH; 

ELSE DEST[255:224] ^ OOOOOOOOOH; FI; 
DEST[MAX_VL-1:256]^0 
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VCMPPS (VEX.1 Z8 encoded version) 

CMPO ^ SRC1 [31:0] OPS SRC2[31:0]; 

CMP1 ^SRCI [63:32] OPS SRC2[63:32]; 
CMP2 ^ SRC1[9S:64] OPS SRC2[9S:64]; 
CMP3 ^ SRC1 [127:96] OPS SRC2[127:96]; 
IF CMPO = TRUE 

THEN DEST[31:0] ^FFFFFFFFH; 

ELSE DEST[31:0] ^ OOOOOOOOOH; FI; 
IFCMP1 =TRUE 

THEN DEST[63:32] ^ FFFFFFFFH; 

ELSE DEST[63:32] ^ OOOOOOOOOH; FI; 
IF CMP2 = TRUE 

THEN DEST[9S:64] ^ FFFFFFFFH; 

ELSE DEST[9S:64] ^ OOOOOOOOOH; FI; 
IF CMP3 = TRUE 

THEN DEST[127:96] ^ FFFFFFFFH; 
ELSE DEST[127:96] ^OOOOOOOOOH; FI; 
DEST[MAX_VL-1:128]^0 


CMPPS (128-bit Legacy SSE version) 

CMPO ^ SRC1 [31:0] OP3 SRC2[31:0]; 

CMP1 ^SRCI [63:32] 0P3SRC2[63:32]; 

CMP2 ^ SRC1 [9S:64] 0P3 SRC2[9S:64]; 

CMP3 ^ SRC1 [127:96] OP3 SRC2[127:96]; 

IF CMPO = TRUE 

THEN DEST[31:0] ^FFFFFFFFH; 

ELSE DEST[31:0] ^ OOOOOOOOOH; FI; 

IFCMP1 =TRUE 

THEN DEST[63:32] ^ FFFFFFFFH; 

ELSE DEST[63:32] ^ OOOOOOOOOH; FI; 

IF CMP2 = TRUE 

THEN DEST[9S:64] ^ FFFFFFFFH; 

ELSE DEST[9S:64] ^ OOOOOOOOOH; FI; 

IF CMP3 = TRUE 

THEN DEST[127:96] ^ FFFFFFFFH; 

ELSE DEST[127:96] ^OOOOOOOOOH; FI; 

DEST[MAX_VL-1:128] (Unmodified) 

Intel C/C++ Compiler Intrinsic Equivalent 

VCMPPS_mmask16_mmS12_cmp_ps_mask(_mSI 2 a,_mSI 2 b, Int Imm); 

VCMPPS_mmaski 6 _mmS12_cmp_round_ps_mask(_mSI 2 a,_mSI 2 b, Int Imm, Int sae); 

VCMPPS_mmaski 6 _mmS12_mask_cmp_ps_mask(_mmaski 6 k1,_mS12 a,_mS12 b, Int Imm); 

VCMPPS_mmaski 6 _mmS12_mask_cmp_round_ps_mask(_mmaski 6 k1,_mS12 a,_mS12 b, Int Imm, Int sae); 

VCMPPD_mmaskS _mm2S6_cmp_ps_mask(_m2S6 a,_m2S6 b, Int Imm); 

VCMPPS_mmaskS _mm2S6_mask_cmp_ps_mask(_mmaskS k1,_m2S6 a,_m2S6 b, int imm); 

VCMPPS_mmaskS _mm_cmp_ps_mask(_ml 28 a,_ml 28 b, int imm); 

VCMPPS_mmaskS _mm_mask_cmp_ps_mask(_mmaskS k1,_ml 28 a,_ml 28 b, int imm); 

VCMPPS_m256 _mm256_cmp_ps(_m256 a,_m256 b, int imm) 

CMPPS_ml 28 _mm_cmp_ps(_ml 28 a,_ml 28 b, int imm) 

SIMD Floating-Point Exceptions 

Invalid if SNaN operand and invalid if QNaN and predicate as listed in Table 3-1. 

Denormal 
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Other Exceptions 

VEX-encoded instructions, see Exceptions Type 2. 
EVEX-encoded instructions, see Exceptions Type E2. 
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CMPS/CMPSB/CMPSW/CMPSD/CMPSQ-Compare String Operands 


Opcode 

Instruction 

Op/ 

En 

64-Bit 

Mode 

Compat/ 
Leg Mode 

Description 

A6 

CMPS m8, m8 

NP 

Valid 

Valid 

For legacy mode, compare byte at address DS:(E)SI with 
byte at address ES:(E)DI; For 64-bit mode compare byte 
at address (R|E)SI to byte at address (R|E)DI. The status 
flags are set accordingly. 

A7 

CMPSm76,m76 

NP 

Valid 

Valid 

For legacy mode, compare word at address DS:(E)SI 
with word at address ES:(E)DI; For 64-bit mode 
compare word at address (R|E)SI with word at address 
(R|E)DI. The status flags are set accordingly. 

A7 

CMPS m32, m32 

NP 

Valid 

Valid 

For legacy mode, compare dword at address DS:(E)SI at 
dword at address ES:(E)DI; For 64-bit mode compare 
dword at address (R|E)SI at dword at address (R|E)DI. 
The status flags are set accordingly. 

REX.W + A7 

CMPS m64, m64 

NP 

Valid 

N.E. 

Compares quadword at address (R|E)SI with quadword 
at address (R|E)DI and sets the status flags accordingly. 

A6 

CMPSB 

NP 

Valid 

Valid 

For legacy mode, compare byte at address DS:(E)SI with 
byte at address ES:(E)DI; For 64-bit mode compare byte 
at address (R|E)SI with byte at address (R|E)DI. The 
status flags are set accordingly. 

A7 

CMPSW 

NP 

Valid 

Valid 

For legacy mode, compare word at address DS:(E)SI 
with word at address ES:(E)DI; For 64-bit mode 
compare word at address (R|E)SI with word at address 
(R|E)DI. The status flags are set accordingly. 

A7 

CMPSD 

NP 

Valid 

Valid 

For legacy mode, compare dword at address DS:(E)SI 
with dword at address ES:(E)DI; For 64-bit mode 
compare dword at address (R|E)SI with dword at 
address (R|E)DI. The status flags are set accordingly. 

REX.W + A7 

CMPSQ 

NP 

Valid 

N.E. 

Compares quadword at address (R|E)SI with quadword 
at address (R|E)DI and sets the status flags accordingly. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

NP 

NA 

NA 

NA 

NA 


Description 

Compares the byte, word, doubleword, or quadword specified with the first source operand with the byte, word, 
doubleword, or quadword specified with the second source operand and sets the status flags in the EFLAGS register 
according to the results. 

Both source operands are located in memory. The address of the first source operand is read from DS:SI, DS:ESI 
or RSI (depending on the address-size attribute of the instruction is 16, 32, or 64, respectively). The address of the 
second source operand is read from ES:DI, ES:EDI or RDI (again depending on the address-size attribute of the 
instruction is 16, 32, or 64). The DS segment may be overridden with a segment override prefix, but the ES 
segment cannot be overridden. 

At the assembly-code level, two forms of this instruction are allowed: the "explicit-operands" form and the "no¬ 
operands" form. The explicit-operands form (specified with the CMPS mnemonic) allows the two source operands 
to be specified explicitly. Flere, the source operands should be symbols that indicate the size and location of the 
source values. This explicit-operand form is provided to allow documentation. Flowever, note that the documenta¬ 
tion provided by this form can be misleading. That is, the source operand symbols must specify the correct type 
(size) of the operands (bytes, words, or doublewords, quadwords), but they do not have to specify the correct loca- 
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tion. Locations of the source operands are always specified by the DS:(E)SI (or RSI) and ES:(E)DI (or RDI) regis¬ 
ters, which must be loaded correctly before the compare string instruction is executed. 

The no-operands form provides "short forms" of the byte, word, and doubleword versions of the CMPS instructions. 
Here also the DS:(E)SI (or RSI) and ES:(E)DI (or RDI) registers are assumed by the processor to specify the loca¬ 
tion of the source operands. The size of the source operands is selected with the mnemonic: CMPSB (byte compar¬ 
ison), CMPSW (word comparison), CMPSD (doubleword comparison), or CMPSQ (quadword comparison using 
REX.W). 

After the comparison, the (E/R)SI and (E/R)DI registers increment or decrement automatically according to the 
setting of the DF flag in the EFLAGS register. (If the DF flag is 0, the (E/R)SI and (E/R)DI register increment; if the 
DF flag is 1, the registers decrement.) The registers increment or decrement by 1 for byte operations, by 2 for word 
operations, 4 for doubleword operations. If operand size is 64, RSI and RDI registers increment by 8 for quadword 
operations. 

The CMPS, CMPSB, CMPSW, CMPSD, and CMPSQ instructions can be preceded by the REP prefix for block compar¬ 
isons. More often, however, these instructions will be used in a LOOP construct that takes some action based on the 
setting of the status flags before the next comparison is made. See "REP/REPE/REPZ /REPNE/REPNZ—Repeat 
String Operation Prefix" in Chapter 4 of the Intel® 64 and IA-32 Architectures Software Developer's Manual, 
Volume 2B, for a description of the REP prefix. 

In 64-bit mode, the instruction's default address size is 64 bits, 32 bit address size is supported using the prefix 
67H. Use of the REX.W prefix promotes doubleword operation to 64 bits (see CMPSQ). See the summary chart at 
the beginning of this section for encoding data and limits. 

Operation 

temp ^ SRC1 - SRC2; 

SetStatusFlags(temp); 

IF (64-Bit Mode) 

THEN 

IF (Byte comparison) 

THENIFDF=0 

THEN 

(R|E)SI^(R|E)SI + 1; 

(R|E)DI^(R|E)DI + 1; 

ELSE 

(R|E)SI^(R|E)SI- 1; 

(R|E)DI^(R|E)DI-1; 

FI; 

ELSE IF (Word comparison) 

THENIFDF = 0 
THEN 

(R|E)SI ^ (R|E)SI + 2; 

(R|E)DI ^ (R|E)DI + 2; 

ELSE 

(R|E)SI ^ (R|E)SI - 2; 

(R|E)DI ^ (R|E)DI - 2; 

FI; 

ELSE IF (Doubleword comparison) 

THENIFDF=0 

THEN 

(R|E)SI ^ (R|E)SI + 4; 

(R|E)DI^(R|E)DI + 4; 

ELSE 

(R|E)SI ^ (R|E)SI - 4; 

(R|E)DI ^ (R|E)DI - 4; 

FI; 


3-170 Vol. 2A 


CMPS/CMPSB/CMPSW/CMPSD/CMPSQ-Compare String Operands 


INSTRUCTION SET REFERENCE, A-L 


ELSE (* Quadword comparison *) 


THENIFDF = 

0 

(R|E)SI^ 

■ (R|E)SI -H 8; 

(R|E)DI^ 

-(R|E)DI-h8; 

ELSE 


(R|E)SI^ 

■ (R|E)SI - 8; 

(R|E)DI^ 

- (R|E)DI - 8; 

FI; 



ELSE (* Non-64-blt Mode *) 
IF (byte comparison) 
THENIFDF = 0 


THEN 


(E)SI ^ 

-(E)SI + 1; 

(E)DI f 

-(E)DI-hI; 

ELSE 


(E)SI ^ 

-(E)SI-1; 

(E)DI f 

-(E)DI- 1; 

FI; 



ELSE IF (Word comparison) 


THEN IF DF 

= 0 

(E)SI ^ 

(E)SI + 2; 

(E)DI^ 

■ (E)DI -H 2; 

ELSE 


(E)SI ^ 

1 

LL? 

(E)DI 

■ (E)DI - 2; 

Fl; 



ELSE (* Doubleword comparison *) 


THEN IF DF 

= 0 

(E)SI ^ 

(E)SI -t 4; 

(E)DI^ 

-(E)DI-h4; 

ELSE 


(E)SI ^ 

1 

LL? 

(E)DI 

■ (E)DI - 4; 

Fl; 



Flags Affected 

The CF, OF, SF, ZF, AF, and PF flags are set according to the temporary result of the comparison. 


Protected Mode Exceptions 


#GP(0) 

#SS(0) 

#PF(fault-code) 

#AC(0) 

#UD 


If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 
If the DS, ES, FS, or GS register contains a NULL segment selector. 

If a memory operand effective address is outside the SS segment limit. 

If a page fault occurs. 

If alignment checking is enabled and an unaligned memory reference is made while the 
current privilege level is 3. 

If the LOCK prefix is used. 
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Real-Address Mode Exceptions 

#GP If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

#SS If a memory operand effective address is outside the SS segment limit. 

#UD If the LOCK prefix is used. 

Virtual-SOSe Mode Exceptions 

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

#SS(0) If a memory operand effective address is outside the SS segment limit. 

#PF(fault-code) If a page fault occurs. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made. 

#UD If the LOCK prefix is used. 

Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

e4-Bit Mode Exceptions 

#SS(0) If a memory address referencing the SS segment is in a non-canonical form. 

#GP(0) If the memory address is in a non-canonical form. 

#PF(fault-code) If a page fault occurs. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the 

current privilege level is 3. 

#UD If the LOCK prefix is used. 
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CMPSD—Compare Scalar Double-Precision Floating-Point Value 


Opcode/ 

Instruction 

Op/ 

En 

64/32 
bit Mode 
Support 

CPUID 

Feature 

Flag 

Description 

F2 OF C2 /r ib 

CMPSD xmmi, xmm2/m64, imm8 

RMI 

V/V 

SSE2 

Compare low double-precision floating-point value in 
xmm2/m64 and xmmi using bits 2:0 of immS as comparison 
predicate. 

VEX.NDS.128.F2.0F.WIGC2 /r ib 
VCMPSD xmmi, xmm2, 
xmm3/m64, imm8 

RVMI 

v/v 

AVX 

Compare low double-precision floating-point value in 
xmm3/m64 and xmm2 using bits 4:0 of immS as comparison 
predicate. 

EVEX.NDS.LIG.F2.0F.W1 C2 /r ib 
VCMPSD k1 [k2}, xmm2, 
xmm3/m64{sae}, imm8 

T1S 

V/V 

AVX512F 

Compare low double-precision floating-point value in 
xmm3/m64 and xmm2 using bits 4:0 of immS as comparison 
predicate with writemask k2 and leave the result in mask 
register k1. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RMI 

ModRM:reg (r, w) 

ModRM:r/m (r) 

ImmS 

NA 

RVMI 

ModRM:reg (w) 

VEX.vvvv 

ModRM:r/m (r) 

ImmS 

T1S 

ModRM:reg (w) 

EVEX.vvvv 

ModRM:r/m (r) 

ImmS 


Description 

Compares the low double-precision floating-point values in the second source operand and the first source operand 
and returns the results in of the comparison to the destination operand. The comparison predicate operand (imme¬ 
diate operand) specifies the type of comparison performed. 

128-bit Legacy SSE version: The first source and destination operand (first operand) is an XMM register. The 
second source operand (second operand) can be an XMM register or 64-bit memory location. Bits (MAX_VL-1:64) 
of the corresponding VMM destination register remain unchanged. The comparison result is a quadword mask of all 
Is (comparison true) or all Os (comparison false). 

VEX.128 encoded version: The first source operand (second operand) is an XMM register. The second source 
operand (third operand) can be an XMM register or a 64-bit memory location. The result is stored in the low quad- 
word of the destination operand; the high quadword is filled with the contents of the high quadword of the first 
source operand. Bits (MAX_VL-1:128) of the destination ZMM register are zeroed. The comparison result is a quad- 
word mask of all Is (comparison true) or all Os (comparison false). 

EVEX encoded version: The first source operand (second operand) is an XMM register. The second source operand 
can be a XMM register or a 64-bit memory location. The destination operand (first operand) is an opmask register. 
The comparison result is a single mask bit of 1 (comparison true) or 0 (comparison false), written to the destination 
starting from the LSB according to the writemask k2. Bits (MAX_KL-1:128) of the destination register are cleared. 

The comparison predicate operand is an 8-bit immediate: 

• For instructions encoded using the VEX prefix, bits 4:0 define the type of comparison to be performed (see 
Table 3-1). Bits 5 through 7 of the immediate are reserved. 

• For instruction encodings that do not use VEX prefix, bits 2:0 define the type of comparison to be made (see 
the first 8 rows of Table 3-1). Bits 3 through 7 of the immediate are reserved. 

The unordered relationship is true when at least one of the two source operands being compared is a NaN; the 
ordered relationship is true when neither source operand is a NaN. 

A subsequent computational instruction that uses the mask result in the destination operand as an input operand 
will not generate an exception, because a mask of all Os corresponds to a floating-point value of +0.0 and a mask 
of all Is corresponds to a QNaN. 

Note that processors with "CPUID.1FI:ECX.AVX =0" do not implement the "greater-than", "greater-than-or-equal", 
"not-greater than", and "not-greater-than-or-equal relations" predicates. These comparisons can be made either 
by using the inverse relationship (that is, use the "not-less-than-or-equal" to make a "greater-than" comparison) 
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or by using software emulation. When using software emulation, the program must swap the operands (copying 
registers when necessary to protect the data that will now be in the destination), and then perform the compare 
using a different predicate. The predicate to be used for these emulations is listed in the first 8 rows of Table 3-7 
(Intel 64 and IA-32 Architectures Software Developer's Manual Volume 2A) under the heading Emulation. 

Compilers and assemblers may implement the following two-operand pseudo-ops in addition to the three-operand 
CMPSD instruction, for processors with "CPUID.1H:ECX.AVX =0". See Table 3-6. Compiler should treat reserved 
Imm8 values as illegal syntax. 


Table 3-6. Pseudo-Op and CMPSD Implementation 


Pseudo-Op 

CMPSD Implementation 

CMPEQSD xmml, xmmZ 

CMPSD xmml, xmmZ, 0 

CMPLTSD xmml, xmmZ 

CMPSD xmml, xmmZ, 1 

CMPLESD xmml, xmmZ 

CMPSD xmml, xmmZ, Z 

CMPUNORDSD xmml, xmmZ 

CMPSD xmml, xmmZ, 3 

CMPNEQSD xmml, xmmZ 

CMPSD xmml, xmmZ, 4 

CMPNLTSD xmml, xmmZ 

CMPSD xmml, xmmZ, 5 

CMPNLESD xmml, xmmZ 

CMPSD xmml, xmmZ, 6 

CMPORDSD xmml, xmmZ 

CMPSD xmml, xmmZ, 7 


The greater-than relations that the processor does not implement require more than one instruction to emulate in 
software and therefore should not be implemented as pseudo-ops. (For these, the programmer should reverse the 
operands of the corresponding less than relations and use move instructions to ensure that the mask is moved to 
the correct destination register and that the source operand is left intact.) 

Processors with "CPUID.1FI:ECX.AVX =1" implement the full complement of 32 predicates shown in Table 3-7, soft¬ 
ware emulation is no longer needed. Compilers and assemblers may implement the following three-operand 
pseudo-ops in addition to the four-operand VCMPSD instruction. See Table 3-7, where the notations of regl reg2, 
and reg3 represent either XMM registers or VMM registers. Compiler should treat reserved Imm8 values as illegal 
syntax. Alternately, intrinsics can map the pseudo-ops to pre-defined constants to support a simpler intrinsic inter¬ 
face. Compilers and assemblers may implement three-operand pseudo-ops for EVEX encoded VCMPSD instructions 
in a similar fashion by extending the syntax listed in Table 3-7. 


Table 3-7. Pseudo-Op and VCMPSD Implementation 


Pseudo-Op 

CMPSD Implementation 

VCMPEQSD reg 1, regZ, reg3 

VCMPSD reg 1, regZ, reg3, 0 

VCMPLTSD regl, regZ, reg3 

VCMPSD regl, regZ, reg3, 1 

VCMPLESD regl, regZ, reg3 

VCMPSD regl, regZ, reg3, Z 

VCMPUNORDSD regl, regZ, reg3 

VCMPSD reg 1, regZ, reg3, 3 

VCMPNEQSD regl, regZ, reg3 

VCMPSD reg 1, regZ, reg3, 4 

VCMPNLTSD regl, regZ, reg3 

VCMPSD regl, regZ, reg3, 5 

VCMPNLESD regl, regZ, reg3 

VCMPSD reg 1, regZ, reg3, 6 

VCMPORDSD regl, regZ, reg3 

VCMPSD reg 1, regZ, reg3, 7 

VCMPECLUQSD regl, regZ, reg3 

VCMPSD regl, regZ, reg3, 8 

VCMPNGESD regl, regZ, reg3 

VCMPSD reg 1, regZ, reg3, 9 

VCMPNGTSD regl, regZ, reg3 

VCMPSD regl, regZ, reg3, OAH 

VCMPFALSESD regl, regZ, reg3 

VCMPSD regl, regZ, reg3, OBH 

VCMPNEQ_0QSD regl, regZ, reg3 

VCMPSD regl, regZ, reg3, OCH 

VCMPGESD regl, regZ, reg3 

VCMPSD regl, regZ, reg3, ODH 
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Table 3-7. Pseudo-Op and VCMPSD Implementation 


Pseudo-Op 

CMPSD Implementation 

VCMPGTSD reg7, regZ, reg3 

VCMPSD regl, reg2, reg3, OEH 

VCMPTRUESD reg7, regZ, reg3 

VCMPSD regl, reg2, reg3, OFH 

VCMPECLOSSD reg7, regZ, reg3 

VCMPSD regl, reg2, reg3, lOH 

VCMPLT_OQSD reg7, regZ, reg3 

VCMPSD regl, reg2, reg3, 1IH 

VCMPLE_OQSD reg7, regZ, reg3 

VCMPSD regl, reg2, reg3, 12H 

VCMPUNORD_SSD reg7, reg2, reg3 

VCMPSD regl, reg2, reg3, 13H 

VCMPNEQ_USSD reg7, reg2, reg3 

VCMPSD regl, reg2, reg3, 14H 

VCMPNLT_UQSD reg7, reg2, reg3 

VCMPSD regl, reg2, reg3, 15H 

VCMPNLE_UQSD reg7, reg2, reg3 

VCMPSD regl, reg2, reg3, 16H 

VCMPORD SSD reg7, reg2, reg3 

VCMPSD regl, reg2, reg3, 17H 

VCMPEQ_USSD reg7, reg2, reg3 

VCMPSD regl, reg2, reg3, 18H 

VCMPNCE_UQSD reg7, reg2, reg3 

VCMPSD regl, reg2, reg3, 19H 

VCMPNGT_UQSD reg7, reg2, reg3 

VCMPSD regl, reg2, reg3, lAH 

VCMPFALSE_OSSD reg7, reg2, reg3 

VCMPSD regl, reg2, reg3, IBH 

VCMPNEQ_OSSD regl, reg2, reg3 

VCMPSD regl, reg2, reg3, ICH 

VCMPGE_OQSD reg7, reg2, reg3 

VCMPSD regl, reg2, reg3, IDH 

VCMPGT_OQSD reg7, reg2, reg3 

VCMPSD regl, reg2, reg3, lEH 

VCMPTRUE_USSD regl, reg2, reg3 

VCMPSD regl, reg2, reg3, IFH 


Software should ensure VCMPSD is encoded with VEX.L=0. Encoding VCMPSD with VEX.L=1 may encounter unpre- 


dictable behavior across different processor generations. 


Operation 

CASE (COMPARISON PREDICATE) OF 

0: OP3 ^EQ_0Q; OPS ^ECLOQ; 

1: OP3 ^LT_OS; OPS ^LT_0S; 

2: OP3 ^LE_0S; OPS ^LE_OS; 

3: OP3 ^UNORD_Q; OPS ^UN0RD_Q; 

4: OP3 ^NEQ_UQ; OPS ^NECLUQ; 

S: OP3 ^NLT_US; OPS ^NLT_US; 

6: OP3 ^NLE_US; OPS ^NLE_US; 

7: OP3 ^0RD_Q; OPS ^0RD_Q; 

8: OPS ^EQ_UQ; 

9: OPS ^NGE_US; 

10: OPS ^NGT_US; 

11: OPS ^FALSE_OQ; 

12: OPS ^NEQ_OQ; 

13: OPS ^GE_OS; 

14: OPS ^GT_0S; 

1S: OPS ^TRUE_UQ; 

16: OPS ^EQ_0S; 

17: OPS ^LT_0Q; 

18: OPS ^LE_OQ; 

19: OPS ^UNORD_S; 

20: OPS ^NECLUS; 

21: OPS ^NLT_UQ; 


CMPSD—Compare Scalar Double-Precision Floating-Point Value 

Vol.2A 3-175 






















INSTRUCTION SET REFERENCE, A-L 


22: OPS ^NLE_UQ; 

23: OPS ^ORD_S; 

24: OPS ^EQ_US; 

2S: OPS ^NGE_UQ; 

26: OPS ^NGT_UQ; 

27: OPS ^FALSE_OS; 

28: OPS ^NECLOS; 

29: OPS ^GE_OQ; 

30: OPS ^GT_OQ; 

31: OPS ^TRUE_US; 

DEFAULT: Reserved 
ESAC; 

VCMPSD (EVEX encoded version) 

CMPO ^ SRC1 [63:0] OPS SRC2[63:0]; 

IF k2[0] or *no writemask* 

THEN IF CMPO = TRUE 

THEN DEST[0] ^ 1; 

ELSE DEST[0] ^ 0; FI; 

ELSE DEST[0] <- 0 ; zeroing-masking only 

FI; 

DEST[MAX_KL-1:1] ^0 

CMPSD (128-bit Legacy SSE version) 

CMPO ^DEST[63:0] OP3 SRC[63:0]; 

IF CMPO = TRUE 

THEN DEST[63:0] ^FFFFFFFFFFFFFFFFH; 

ELSE DEST[63:0] ^OOOOOOOOOOOOOOOOH; FI; 

DEST[MAX_VL-1:64] (Unmodified) 

VCMPSD (VEX.128 encoded version) 

CMPO ^SRCI [63:0] OPS SRC2[63:0]; 

IF CMPO = TRUE 

THEN DEST[63:0] ^FFFFFFFFFFFFFFFFH; 

ELSE DEST[63:0] ^OOOOOOOOOOOOOOOOH; FI; 

DEST[127:64] ^SRCI [127:64] 

DEST[MAX_VL-1:128] ^0 

Intel C/C++ Compiler Intrinsic Equivalent 

VCMPSD_mmask8 _mm_cmp_sd_mask(_ml 28d a,_ml 28d b, Int imm); 

VCMPSD_mmask8 _mm_cmp_round_sd_mask(_ml 28d a,_ml 28d b, Int imm, int sae); 

VCMPSD_mmask8 _mm_mask_cmp_sd_mask(_mmask8 k1,_ml 28d a,_ml 28d b, int imm); 

VCMPSD_mmask8 _mm_mask_cmp_round_sd_mask(_mmask8 k1,_ml 28d a,_ml 28d b, int imm, int sae); 

(V)CMPSD_ml 28d _mm_cmp_sd(_ml 28d a,_ml 28d b, const int imm) 

SIMD Floating-Point Exceptions 

Invalid if SNaN operand, Invalid if QNaN and predicate as listed in Table 3-1 Denormal. 

Other Exceptions 

VEX-encoded instructions, see Exceptions Type 3. 

EVEX-encoded instructions, see Exceptions Type E3. 
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CMPSS—Compare Scalar Single-Precision Floating-Point Value 


Opcode/ 

Instruction 

Op/ 

En 

64/32 
bit Mode 
Support 

CPUID 

Feature 

Flag 

Description 

F3 OF C2 /r ib 

CMPSS xmmi, xmm2/m32, imm8 

RMI 

V/V 

SSE 

Compare low single-precision floating-point value in 
xmm2/m32 and xmmi using bits 2:0 of imm8 as 
comparison predicate. 

VEX.NDS.128.F3.0F.WIGC2 /r ib 

VCMPSS xmmi, xmm2, xmm3/m32, 
imm8 

RVMI 

v/v 

AVX 

Compare low single-precision floating-point value in 
xmm3/m32 and xmm2 using bits 4:0 of imm8 as 
comparison predicate. 

EVEX.NDS.LIG.F3.0F.W0 C2 /r ib 

VCMPSS k1 [k2}, xmm2, 
xmm3/m32[sae}, imm8 

T1S 

V/V 

AVX512F 

Compare low single-precision floating-point value in 
xmm3/m32 and xmm2 using bits 4:0 of imm8 as 
comparison predicate with writemask k2 and leave the 
result in mask register k1. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RMI 

ModRM:reg (r, w) 

ModRM:r/m (r) 

ImmS 

NA 

RVMI 

ModRM:reg (w) 

VEX.vvvv 

ModRM:r/m (r) 

ImmS 

T1S 

ModRM:reg (w) 

EVEX.vvvv 

ModRM:r/m (r) 

ImmS 


Description 

Compares the low single-precision floating-point values in the second source operand and the first source operand 
and returns the results of the comparison to the destination operand. The comparison predicate operand (imme¬ 
diate operand) specifies the type of comparison performed. 

128-bit Legacy SSE version: The first source and destination operand (first operand) is an XMM register. The 
second source operand (second operand) can be an XMM register or 32-bit memory location. Bits (MAX_VL-1:32) 
of the corresponding VMM destination register remain unchanged. The comparison result is a doubleword mask of 
all Is (comparison true) or all Os (comparison false). 

VEX.128 encoded version: The first source operand (second operand) is an XMM register. The second source 
operand (third operand) can be an XMM register or a 32-bit memory location. The result is stored in the low 32 bits 
of the destination operand; bits 128:32 of the destination operand are copied from the first source operand. Bits 
(MAX_VL-1:128) of the destination ZMM register are zeroed. The comparison result is a doubleword mask of all Is 
(comparison true) or all Os (comparison false). 

EVEX encoded version: The first source operand (second operand) is an XMM register. The second source operand 
can be a XMM register or a 32-bit memory location. The destination operand (first operand) is an opmask register. 
The comparison result is a single mask bit of 1 (comparison true) or 0 (comparison false), written to the destination 
starting from the LSB according to the writemask k2. Bits (MAX_KL-1:128) of the destination register are cleared. 

The comparison predicate operand is an 8-bit immediate: 

• For instructions encoded using the VEX prefix, bits 4:0 define the type of comparison to be performed (see 
Table 3-1). Bits 5 through 7 of the immediate are reserved. 

• For instruction encodings that do not use VEX prefix, bits 2:0 define the type of comparison to be made (see 
the first 8 rows of Table 3-1). Bits 3 through 7 of the immediate are reserved. 


The unordered relationship is true when at least one of the two source operands being compared is a NaN; the 
ordered relationship is true when neither source operand is a NaN. 

A subsequent computational instruction that uses the mask result in the destination operand as an input operand 
will not generate an exception, because a mask of all Os corresponds to a floating-point value of +0.0 and a mask 
of all Is corresponds to a QNaN. 

Note that processors with "CPUID.1FI:ECX.AVX =0" do not implement the "greater-than", "greater-than-or-equal", 
"not-greater than", and "not-greater-than-or-equal relations" predicates. These comparisons can be made either 
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by using the inverse relationship (that is, use the "not-less-than-or-equal" to make a "greater-than" comparison) 
or by using software emulation. When using software emulation, the program must swap the operands (copying 
registers when necessary to protect the data that will now be in the destination), and then perform the compare 
using a different predicate. The predicate to be used for these emulations is listed in the first 8 rows of Table 3-7 
(Intel 64 and IA-32 Architectures Software Developer's Manual Volume 2A) under the heading Emulation. 

Compilers and assemblers may implement the following two-operand pseudo-ops in addition to the three-operand 
CMPSS instruction, for processors with "CPUID.1H:ECX.AVX =0". See Table 3-8. Compiler should treat reserved 
Imm8 values as illegal syntax. 


Table 3-8. Pseudo-Op and CMPSS Implementation 


Pseudo-Op 

CMPSS Implementation 

CMPEQSS xmml, xmmZ 

CMPSS xmml, xmmZ, 0 

CMPLTSS xmml, xmmZ 

CMPSS xmml, xmmZ, 1 

CMPLESS xmml, xmmZ 

CMPSS xmml, xmmZ, Z 

CMPUNORDSS xmml, xmmZ 

CMPSS xmml, xmmZ, 3 

CMPNEQSS xmml, xmmZ 

CMPSS xmml, xmmZ, 4 

CMPNLTSS xmml, xmmZ 

CMPSS xmml, xmmZ, 5 

CMPNLESS xmml, xmmZ 

CMPSS xmml, xmmZ, 6 

CMPORDSS xmml, xmmZ 

CMPSS xmml, xmmZ, 7 


The greater-than relations that the processor does not implement require more than one instruction to emulate in 
software and therefore should not be implemented as pseudo-ops. (For these, the programmer should reverse the 
operands of the corresponding less than relations and use move instructions to ensure that the mask is moved to 
the correct destination register and that the source operand is left intact.) 

Processors with "CPUID.1FI:ECX.AVX =1" implement the full complement of 32 predicates shown in Table 3-7, soft¬ 
ware emulation is no longer needed. Compilers and assemblers may implement the following three-operand 
pseudo-ops in addition to the four-operand VCMPSS instruction. See Table 3-9, where the notations of regl reg2, 
and reg3 represent either XMM registers or VMM registers. Compiler should treat reserved Imm8 values as illegal 
syntax. Alternately, intrinsics can map the pseudo-ops to pre-defined constants to support a simpler intrinsic inter¬ 
face. Compilers and assemblers may implement three-operand pseudo-ops for EVEX encoded VCMPSS instructions 
in a similar fashion by extending the syntax listed in Table 3-9. 


Table 3-9. Pseudo-Op and VCMPSS Implementation 


Pseudo-Op 

CMPSS Implementation 

VCMPEQSS regl, regZ, reg3 

VCMPSS regl, regZ, reg3, 0 

VCMPLTSS regl, regZ, reg3 

VCMPSS regl, regZ, reg3, 1 

VCMPLESS regl, regZ, reg3 

VCMPSS regl, regZ, reg3, Z 

VCMPUNORDSS regl, regZ, reg3 

VCMPSS regl, regZ, reg3, 3 

VCMPNEQSS regl, regZ, reg3 

VCMPSS regl, regZ, reg3, 4 

VCMPNLTSS regl, regZ, reg3 

VCMPSS regl, regZ, reg3, 5 

VCMPNLESS regl, regZ, reg3 

VCMPSS regl, regZ, reg3, 6 

VCMPORDSS regl, regZ, reg3 

VCMPSS regl, regZ, reg3, 7 

VCMPEQ_UQSS regl, regZ, reg3 

VCMPSS regl, regZ, reg3, 8 

VCMPNGESS regl, regZ, reg3 

VCMPSS regl, regZ, reg3, 9 

VCMPNGTSS regl, regZ, reg3 

VCMPSS regl, regZ, reg3, OAH 

VCMPFALSESS regl, regZ, reg3 

VCMPSS regl, regZ, reg3, OBH 

VCMPNECLOQSS regl, regZ, reg3 

VCMPSS regl, regZ, reg3, OCH 

VCMPGESS regl, regZ, reg3 

VCMPSS regl, regZ, reg3, ODH 
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Table 3-9. Pseudo-Op and VCMPSS Implementation 


Pseudo-Op 

CMPSS Implementation 

VCMPGTSS legh regZ, reg3 

VCMPSS regl, regZ, reg3, OEH 

VCMPTRUESS legh regZ, reg3 

VCMPSS regl, regZ, reg3, OFH 

VCMPECLOSSS regl, regZ, reg3 

VCMPSS regl, regZ, reg3, lOH 

VCMPLT_OQSS reg7, regZ, reg3 

VCMPSS regl, regZ, reg3, 1IH 

VCMPLE_OQSS regh regZ, reg3 

VCMPSS regl, regZ, reg3, IZH 

VCMPUNORD_SSS regl, regZ, reg3 

VCMPSS regl, regZ, reg3, 13H 

VCMPNECLUSSS regl, regZ, reg3 

VCMPSS regl, regZ, reg3, 14H 

VCMPNLT_UQSS regl, regZ, reg3 

VCMPSS regl, regZ, reg3, 15H 

VCMPNLE_UQSS regl, regZ, reg3 

VCMPSS regl, regZ, reg3, 16H 

VCMPORD SSS regl, regZ, reg3 

VCMPSS regl, regZ, reg3, 17H 

VCMPECLUSSS regl, regZ, reg3 

VCMPSS regl, regZ, reg3, 18H 

VCMPNGE_UQSS regl, regZ, reg3 

VCMPSS regl, regZ, reg3, 19H 

VCMPNGT_UQSS regl, regZ, reg3 

VCMPSS regl, regZ, reg3, lAH 

VCMPFALSE_OSSS regl, regZ, reg3 

VCMPSS regl, regZ, reg3, IBH 

VCMPNECLOSSS regl, regZ, reg3 

VCMPSS regl, regZ, reg3, ICH 

VCMPGE_OQSS regl, regZ, reg3 

VCMPSS regl, regZ, reg3, IDH 

VCMPGT_OQSS regl, regZ, reg3 

VCMPSS regl, regZ, reg3, lEH 

VCMPTRUE_USSS regl, regZ, reg3 

VCMPSS regl, regZ, reg3, IFH 


Software should ensure VCMPSS is encoded with VEX.L=0. Encoding VCMPSS with VEX.L=1 may encounter unpre- 


dictable behavior across different processor generations. 


Operation 

CASE (COMPARISON PREDICATE) OF 

0: OP3 ^EQ_OQ; OPS ^ECLOQ; 

1: OP3 ^LT_OS; OPS ^LT_OS; 

2: OP3 ^LE_OS; OPS ^LE_OS; 

3: OP3 ^UNORD_Q; OPS ^UNORD_Q; 

4: OP3 ^NEQ_UQ; OPS ^NECLUQ; 

S: OP3 ^NLT_US; OPS ^NLT_US; 

6: OP3 ^NLE_US; OPS ^NLE_US; 

7: OP3 ^ORD_Q; OPS ^ORD_Q; 

8: OPS ^EQ_UQ; 

9: OPS ^NGE_US; 

10: OPS ^NGT_US; 

11: OPS ^FALSE_OQ; 

12: OPS ^NEQ_OQ; 

13: OPS ^GE_OS; 

14: OPS ^GT_0S; 

1S: OPS ^TRUE_UQ; 

16: OPS ^EQ_0S; 

17: OPS ^LT_0Q; 

18: OPS ^LE_OQ; 

19: OPS ^UNORD_S; 

20: OPS ^NECLUS; 

21: OPS ^NLT_UQ; 
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22: OPS ^NLE_UQ; 

23: OPS ^ORD_S; 

24: OPS ^EQ_US; 

2S: OPS ^NGE_UQ; 

26: OPS ^NGT_UQ; 

27: OPS ^FALSE_OS; 

28: OPS ^NECLOS; 

29: OPS ^GE_OQ; 

30: OPS ^GT_OQ; 

31: OPS ^TRUE_US; 

DEFAULT: Reserved 
ESAC; 

VCMPSS (EVEX encoded version) 

CMPO ^ SRC1 [31:0] OPS SRC2[31:0]; 

IF k2[0] or *no writemask* 

THEN IF CMPO = TRUE 

THEN DEST[0] ^ 1; 

ELSE DEST[0] ^ 0; FI; 

ELSE DEST[0] <- 0 ; zeroing-masking only 

FI; 

DEST[MAX_KL-1:1] ^0 

CMPSS (128-bit Legacy SSE version) 

CMPO ^DEST[31:0] OP3 SRC[31:0]; 

IF CMPO = TRUE 

THEN DEST[31:0] ^FFFFFFFFH; 

ELSE DEST[31:0] ^OOOOOOOOH; FI; 

DEST[MAX_VL-1:32] (Unmodified) 

VCMPSS (VEX.128 encoded version) 

CMPO ^SRCI [31:0] OPS SRC2[31:0]; 

IF CMPO = TRUE 

THEN DEST[31:0] ^FFFFFFFFH; 

ELSE DEST[31:0] ^OOOOOOOOH; FI; 

DEST[127:32] ^SRCI [127:32] 

DEST[MAX_VL-1:128] ^0 

Intel C/C++ Compiler Intrinsic Equivalent 

VCMPSS_mmask8 _mm_cmp_ss_mask(_ml 28 a,_ml 28 b, Int imm); 

VCMPSS_mmask8 _mm_cmp_round_ss_mask(_ml 28 a,_ml 28 b, int imm, int sae); 

VCMPSS_mmask8 _mm_mask_cmp_ss_mask(_mmask8 k1,_ml 28 a,_ml 28 b, int imm); 

VCMPSS_mmask8 _mm_mask_cmp_round_ss_mask(_mmask8 k1,_ml 28 a,_ml 28 b, int imm, int sae); 

(V)CMPSS_ml 28_mm_cmp_ss(_ml 28 a,_ml 28 b, const int imm) 

SIMD Floating-Point Exceptions 

Invalid if SNaN operand, Invalid if QNaN and predicate as listed in Table 3-1, Denormal. 

Other Exceptions 

VEX-encoded instructions, see Exceptions Type 3. 

EVEX-encoded instructions, see Exceptions Type E3. 
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CMPXCHG—Compare and Exchange 


Opcode/ 

Instruction 

Op/ 

En 

64-Bit 

Mode 

Compat/ 
Leg Mode 

Description 

OF BO/r 

CMPXCHG r/mS, rS 

MR 

Valid 

Valid* 

Compare AL with r/mS. If equal, ZF is set and rS is loaded Into 
r/mS. Else, clear ZF and load r/mS Into AL. 

REX + OF BO/r 

CMPXCHG r/m8**r8 

MR 

Valid 

N.E. 

Compare AL with r/m8. If equal, ZF is set and r8 is loaded Into 
r/mS. Else, clear ZF and load r/m8 Into AL. 

OFBI/r 

CMPXCHG r/m 76, rl6 

MR 

Valid 

Valid* 

Compare AX with r/m 76. If equal, ZF is set and r76 is loaded 
Into r/m 76. Else, clear ZF and load r/m76 into AX. 

OFBI/r 

CMPXCHG r/m32, r32 

MR 

Valid 

Valid* 

Compare EAX with r/m32. If equal, ZF Is set and r32 is loaded 
Into r/m32. Else, clear ZF and load r/m32 into EAX. 

REX.W + OFBI/r 

CMPXCHG r/m64, r64 

MR 

Valid 

N.E. 

Compare RAX with r/m64. If equal, ZF Is set and r64 is loaded 
Into r/m64. Else, clear ZF and load r/m64 into RAX. 


NOTES: 

* See the IA-32 Architecture Compatibility section below. 

** In 64-blt mode, r/mS can not be encoded to access the following byte registers If a REX prefix is used: AH, BH, CH, DH. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

MR 

ModRM:r/m (r, w) 

ModRM:reg (r) 

NA 

NA 


Description 

Compares the value in the AL, AX, EAX, or RAX register with the first operand (destination operand). If the two 
values are equal, the second operand (source operand) is loaded into the destination operand. Otherwise, the 
destination operand is loaded into the AL, AX, EAX or RAX register. RAX register is available only in 64-bit mode. 

This instruction can be used with a LOCK prefix to allow the instruction to be executed atomically. To simplify the 
interface to the processor's bus, the destination operand receives a write cycle without regard to the result of the 
comparison. The destination operand is written back if the comparison fails; otherwise, the source operand is 
written into the destination. (The processor never produces a locked read without also producing a locked write.) 

In 64-bit mode, the instruction's default operation size is 32 bits. Use of the REX.R prefix permits access to addi¬ 
tional registers (R8-R15). Use of the REX.W prefix promotes operation to 64 bits. See the summary chart at the 
beginning of this section for encoding data and limits. 

IA-32 Architecture Compatibility 

This instruction is not supported on Intel processors earlier than the Intel486 processors. 

Operation 

(* Accumulator = AL, AX, EAX, or RAX depending on whether a byte, word, doubleword, or guadword comparison is being performed *) 
TEMP ^ BEST 
IF accumulator = TEMP 
THEN 

ZF^ 1; 

BEST ^ SRC; 

ELSE 

ZF ^ 0; 

accumulator ^ TEMP; 

BEST ^ TEMP; 

FI; 
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Flags Affected 

The ZF flag is set if the values in the destination operand and register AL, AX, or EAX are equal; otherwise it is 
cleared. The CF, PF, AF, SF, and OF flags are set according to the results of the comparison operation. 


Protected Mode Exceptions 


#GP(0) 


#SS(0) 

#PF(fault-code) 

#AC(0) 

#UD 


If the destination is located in a non-writable segment. 

If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 
If the DS, ES, FS, or GS register contains a NULL segment selector. 

If a memory operand effective address is outside the SS segment limit. 

If a page fault occurs. 

If alignment checking is enabled and an unaligned memory reference is made while the 
current privilege level is 3. 

If the LOCK prefix is used but the destination is not a memory operand. 


Real-Address Mode 

#GP 

#SS 

#UD 


Exceptions 

If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 
If a memory operand effective address is outside the SS segment limit. 

If the LOCK prefix is used but the destination is not a memory operand. 


Virtual-SOSe Mode 

#GP(0) 

#SS(0) 

#PF(fault-code) 

#AC(0) 

#UD 


Exceptions 

If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 
If a memory operand effective address is outside the SS segment limit. 

If a page fault occurs. 

If alignment checking is enabled and an unaligned memory reference is made. 

If the LOCK prefix is used but the destination is not a memory operand. 


Compatibility Mode Exceptions 

Same exceptions as in protected mode. 


64-Bit Mode Exceptions 

#SS(0) If a memory address referencing the SS segment is in a non-canonical form. 

#GP(0) If the memory address is in a non-canonical form. 

#PF(fault-code) If a page fault occurs. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the 

current privilege level is 3. 

#UD If the LOCK prefix is used but the destination is not a memory operand. 
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CMPXCHG8B/CMPXCHG16B-Compare and Exchange Bytes 


Opcode/ 

Instruction 

Op/ 

En 

64-Bit 

Mode 

Compat/ 
Leg Mode 

Description 

OFC7/7 m64 

CMPXCHG8B m64 

M 

Valid 

Valid* 

Compare EDX:EAX with m64. If equal, set ZF and load 

ECX:EBX into m64. Else, clear ZF and load m64 into EDX:EAX. 

REX.W + OF C7/7 ml28 

CMPXCHG16Bm728 

M 

Valid 

N.E. 

Compare RDX:RAX with ml28. If equal, set ZF and load 
RCX:RBX into ml28. Else, clear ZF and load m728into 
RDX:RAX. 


NOTES: 

*See IA-32 Architecture Compatibility section below. 


Instruction Operand 

Encoding 

Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

M 

ModRM:r/m (r, w) 

NA 

NA 

NA 


Description 

Compares the 64-bit value in EDX:EAX (or 128-bit value in RDX:RAX if operand size is 128 bits) with the operand 
(destination operand). If the values are equal, the 64-bit value in ECX:EBX (or 128-bit value in RCX:RBX) is stored 
in the destination operand. Otherwise, the value in the destination operand is loaded into EDX:EAX (or RDX:RAX). 
The destination operand is an 8-byte memory location (or 16-byte memory location if operand size is 128 bits). For 
the EDX:EAX and ECX:EBX register pairs, EDX and ECX contain the high-order 32 bits and EAX and EBX contain the 
low-order 32 bits of a 64-bit value. For the RDX:RAX and RCXiRBX register pairs, RDX and RCX contain the high- 
order 64 bits and RAX and RBX contain the low-order 64bits of a 128-bit value. 

This instruction can be used with a LOCK prefix to allow the instruction to be executed atomically. To simplify the 
interface to the processor's bus, the destination operand receives a write cycle without regard to the result of the 
comparison. The destination operand is written back if the comparison fails; otherwise, the source operand is 
written into the destination. (The processor never produces a locked read without also producing a locked write.) 

In 64-bit mode, default operation size is 64 bits. Use of the REX.W prefix promotes operation to 128 bits. Note that 
CMPXCFIG16B requires that the destination (memory) operand be 16-byte aligned. See the summary chart at the 
beginning of this section for encoding data and limits. For information on the CPUID flag that indicates 
CMPXCHG16B, see page 3-206. 

IA-32 Architecture Compatibility 

This instruction encoding is not supported on Intel processors earlier than the Pentium processors. 
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Operation 

IF (64-Bit Mode and OperandSIze = 64) 

THEN 

TEMPI 28 ^DEST 
IF(RDX:RAX = TEMP128) 

THEN 

ZF^ 1; 

DEST ^ RCX:RBX; 

ELSE 

ZF^O; 

RDX:RAX ^ TEMPI 28; 

DEST ^ TEMPI 28; 

FI; 

FI 

ELSE 

TEMP64 ^ DEST; 

IF (EDX:EAX = TEMP64) 

THEN 

ZF^ 1; 

DEST ^ ECX:EBX; 

ELSE 

ZF^O; 

EDX:EAX ^ TEMP64; 

DEST ^ TEMP64; 

FI; 

FI; 

FI; 

Flags Affected 

The ZF flag is set if the destination operand and EDX:EAX are equal; otherwise it is cleared. The CF, PF, AF, SF, and 
OF flags are unaffected. 

Protected Mode Exceptions 

#UD If the destination is not a memory operand. 

#GP(0) If the destination is located in a non-writable segment. 

If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

If the DS, ES, FS, or GS register contains a NULL segment selector. 

#SS(0) If a memory operand effective address is outside the SS segment limit. 

#PF(fault-code) If a page fault occurs. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the 

current privilege level is 3. 

Real-Address Mode Exceptions 

#UD If the destination operand is not a memory location. 

#GP If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

#SS If a memory operand effective address is outside the SS segment limit. 
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\/irtual-8086 Mode Exceptions 

#UD If the destination operand is not a memory location. 

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

#SS(0) If a memory operand effective address is outside the SS segment limit. 

#PF(fault-code) If a page fault occurs. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made. 

Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

64-Bit Mode Exceptions 

#SS(0) If a memory address referencing the SS segment is in a non-canonical form. 

#GP(0) If the memory address is in a non-canonical form. 

If memory operand for CMPXCFIG16B is not aligned on a 16-byte boundary. 

If CPUID.01H:ECX.CMPXCHG16B[bit 13] = 0. 

#UD If the destination operand is not a memory location. 

#PF(fault-code) If a page fault occurs. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the 

current privilege level is 3. 
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COMISD—Compare Scalar Ordered Double-Precision Floating-Point Values and Set EFLAGS 


Opcode/ 

Instruction 

Op/ 

En 

64/32 
bit Mode 
Support 

CPUID 

Feature 

Flag 

Description 

66 0F2F/r 

COMISD xmmi, xmmZ/m64 

RM 

V/V 

SSE2 

Compare low double-precision floating-point values in 
xmmi and xmm2/mem64 and set the EFLAGS flags 
accordingly. 

VEX.128.66.0F.WIG2F /r 

VCOMISD xmm 1, xmm2/m64 

RM 

v/v 

AVX 

Compare low double-precision floating-point values in 
xmmi and xmm2/mem64 and set the EFLAGS flags 
accordingly. 

EVEX.LIG.66.0F.W1 2F /r 

VCOMISD xmmi, xmm2/m64{sae] 

T1S 

V/V 

AVX512F 

Compare low double-precision floating-point values in 
xmmi and xmm2/mem64 and set the EFLAGS flags 
accordingly. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RM 

ModRM:reg (w) 

ModRM:r/m (r) 

NA 

NA 

T1S 

ModRM:reg (w) 

ModRM:r/m (r) 

NA 

NA 


Description 

Compares the double-precision floating-point values in the low quadwords of operand 1 (first operand) and 
operand 2 (second operand), and sets the ZF, PF, and CF flags in the EFLAGS register according to the result (unor¬ 
dered, greater than, less than, or equal). The OF, SF and AF flags in the EFLAGS register are set to 0. The unor¬ 
dered result is returned if either source operand is a NaN (QNaN or SNaN). 

Operand 1 is an XMM register; operand 2 can be an XMM register or a 64 bit memory 

location. The COMISD instruction differs from the UCOMISD instruction in that it signals a SIMD floating-point 
invalid operation exception (#1) when a source operand is either a QNaN or SNaN. The UCOMISD instruction signals 
an invalid numeric exception only if a source operand is an SNaN. 

The EFLAGS register is not updated if an unmasked SIMD floating-point exception is generated. 

VEX.vvvv and EVEX.vvvv are reserved and must be 1111b, otherwise instructions will #UD. 

Software should ensure VCOMISD is encoded with VEX.L=0. Encoding VCOMISD with VEX.L=1 may encounter 
unpredictable behavior across different processor generations. 

Operation 

COMISD (all versions) 

RESULT ^ 0rderedCompare(DEST[63:0] < > SRC[63:0]) { 

(* Set EFLAGS *) CASE (RESULT) OF 
UNORDERED: ZF,PF,CF ^ 111; 

GREATER_THAN: ZF,PF,CF ^ 000; 

LESS_THAN:ZF,PF,CF^001; 

EQUAL: ZF,PF,CF ^ 100; 

ESAC; 

OF, AF, SF ^0;} 
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Intel C/C++ Compiler Intrinsic Equivalent 

VCOMISD int _mm_comi_round_sd(_ml 28d a,_ml 28d b, Int Imm, Int sae); 

VCOMISD int _mm_comleq_sd (_ml 28d a,_ml 28d b) 

VCOMISD int _mm_comilt_sd (_ml 28d a,_ml 28d b) 

VCOMISD lnt_mm_comlle_sd (_m128d a,_m128d b) 

VCOMISD int _mm_comigt_sd (_ml 28d a,_ml 28d b) 

VCOMISD int _mm_comige_sd (_ml 28d a,_ml 28d b) 

VCOMISD int_mm_comlneq_sd (_m128d a,_m128d b) 

SIMD Floating-Point Exceptions 

Invalid (if SNaN or QNaN operands), Denormal. 

Other Exceptions 

VEX-encoded instructions, see Exceptions Type 3; 

EVEX-encoded instructions, see Exceptions Type E3NF. 

#UD If VEX.vvvv != llllB or EVEX.vvvv != llllB. 
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COMISS—Compare Scalar Ordered Single-Precision Floating-Point Values and Set 6FLAGS 


Opcode/ 

Instruction 

Op/ 

En 

64/32 
bit Mode 
Support 

CPUID 

Feature 

Flag 

Description 

0F2F/r 

COMISS xmmi, xmm2/m32 

RM 

V/V 

SSE 

Compare low single-precision floating-point values in 
xmmi and xmm2/mem32 and set the EFLAGS flags 
accordingly. 

VEX.128.0F.WIG2F/r 

VCOMISS xmmi, xmm2/m32 

RM 

v/v 

AVX 

Compare low single-precision floating-point values in 
xmmi and xmm2/mem32 and set the EFLAGS flags 
accordingly. 

EVEX.LIG.OF.WO 2F /r 

VCOMISS xmmi, xmm2/m32{sae} 

T1S 

V/V 

AVX512F 

Compare low single-precision floating-point values in 
xmmi and xmm2/mem32 and set the EFLAGS flags 
accordingly. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RM 

ModRM:reg (w) 

ModRM:r/m (r) 

NA 

NA 

T1S 

ModRM:reg (w) 

ModRM:r/m (r) 

NA 

NA 


Description 

Compares the single-precision floating-point values in the low quadwords of operand 1 (first operand) and operand 
2 (second operand), and sets the ZF, PF, and CF flags in the EFLAGS register according to the result (unordered, 
greater than, less than, or equal). The OF, SF and AF flags in the EFLAGS register are set to 0. The unordered result 
is returned if either source operand is a NaN (QNaN or SNaN). 

Operand 1 is an XMM register; operand 2 can be an XMM register or a 32 bit memory location. 

The COMISS instruction differs from the UCOMISS instruction in that it signals a SIMD floating-point invalid opera¬ 
tion exception (#1) when a source operand is either a QNaN or SNaN. The UCOMISS instruction signals an invalid 
numeric exception only if a source operand is an SNaN. 

The EFLAGS register is not updated if an unmasked SIMD floating-point exception is generated. 

VEX.vvvv and EVEX.vvvv are reserved and must be 1111b, otherwise instructions will #UD. 

Software should ensure VCOMISS is encoded with VEX.L=0. Encoding VCOMISS with VEX.L=1 may encounter 
unpredictable behavior across different processor generations. 

Operation 

COMISS (all versions) 

RESULT ^ 0rderedCompare(DEST[31:0] <> SRC[31:0]) { 

(* Set EFLAGS *) CASE (RESULT) OF 
UNORDERED: ZF,PF,CF ^ 111; 

GREATER_THAN: ZF,PF,CF ^ 000; 

LESS_THAN:ZF,PF,CF^001; 

EQUAL: ZF,PF,CF ^ 100; 

ESAC; 

OF, AF, SF ^ 0;} 
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Intel C/C++ Compiler Intrinsic Equivalent 

VCOMISS int_mm_comi_round_ss(_ml Z8 a,_ml 28 b, Int Imm, int sae); 

VCOMISS int_mm_comieq_ss (_ml 28 a,_ml 28 b) 

VCOMISS lnt_mm_comllt_ss (_ml 28 a,_ml 28 b) 

VCOMISS lnt_mm_comile_ss (_ml 28 a,_ml 28 b) 

VCOMISS lnt_mm_comigt_ss (_ml 28 a,_ml 28 b) 

VCOMISS lnt_mm_comige_ss (_ml 28 a,_ml 28 b) 

VCOMISS Int _mm_comineq_ss (_ml 28 a,_ml 28 b) 

SIMD Floating-Point Exceptions 

Invalid (if SNaN or QNaN operands), Denormal. 

Other Exceptions 

VEX-encoded instructions, see Exceptions Type 3; 

EVEX-encoded instructions, see Exceptions Type E3NF. 

#UD If VEX.vvvv != llllB or EVEX.vvvv != llllB. 
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CPUID—CPU Identification 


Opcode 

Instruction 

Op/ 

En 

64-Bit 

Mode 

Compat/ 
Leg Mode 

Description 

0FA2 

CPUID 

NP 

Valid 

Valid 

Returns processor identification and feature 
information to the EAX, EBX, ECX, and EDX 
registers, as determined by input entered in 
EAX (in some cases, ECX as well). 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

NP 

NA 

NA 

NA 

NA 


Description 

The ID flag (bit 21) in the EFLAGS register indicates support for the CPUID instruction. If a software procedure can 
set and clear this flag, the processor executing the procedure supports the CPUID instruction. This instruction oper¬ 
ates the same in non-64-bit modes and 64-bit mode. 

CPUID returns processor identification and feature information in the EAX, EBX, ECX, and EDX registers.^ The 
instruction's output is dependent on the contents of the EAX register upon execution (in some cases, ECX as well). 
For example, the following pseudocode loads EAX with OOFI and causes CPUID to return a Maximum Return Value 
and the Vendor Identification String in the appropriate registers: 

MOV EAX, OOH 
CPUID 

Table 3-8 shows information returned, depending on the initial value loaded into the EAX register. 

Two types of information are returned: basic and extended function information. If a value entered for CPUID.EAX 
is higher than the maximum input value for basic or extended function for that processor then the data for the 
highest basic information leaf is returned. For example, using the Intel Core i7 processor, the following is true: 
CPUID.EAX = OSH (* Returns MONITOR/MWAIT leaf. *) 

CPUID.EAX = OAH (* Returns Architectural Performance Monitoring leaf. *) 

CPUID.EAX = OBH (* Returns Extended Topology Enumeration leaf. *) 

CPUID.EAX = OCH (* INVALID: Returns the same Information as CPUID.EAX = OBH. *) 

CPUID.EAX = 8000000BH (* Returns linear/physical address size data. *) 

CPUID.EAX = BOOOOOOAH (* INVALID: Returns same Information as CPUID.EAX = OBH. *) 

If a value entered for CPUID.EAX is less than or equal to the maximum input value and the leaf is not supported on 
that processor then 0 is returned in all the registers. 

When CPUID returns the highest basic leaf information as a result of an invalid input EAX value, any dependence 
on input ECX value in the basic leaf is honored. 

CPUID can be executed at any privilege level to serialize instruction execution. Serializing instruction execution 
guarantees that any modifications to flags, registers, and memory for previous instructions are completed before 
the next instruction is fetched and executed. 

See also: 

"Serializing Instructions" in Chapter 8, "Multiple-Processor Management," in the Intel® 64 and IA-32 Architectures 
Software Developer's Manual, Volume 3A. 

"Caching Translation Information" in Chapter 4, "Paging," in the I ntel® 64 and I A-32 Architectures Software Devel¬ 
oper's Manual, Volume 3A. 


1. On Intel 64 processors, CPUID clears the high 32 bits of the RAX/RBX/RCX/RDX registers in all modes. 
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Table 3-8. Information Returned by CPUID Instruction 


Initial EAX 
Value 

Information Provided about the Processor 

Basic CPUID Information 

OH 

EAX Maximum Input Value for Basic CPUID Information. 

EBX "Genu" 

ECX "ntel" 

EDX "inel" 

01H 

EAX Version Information: Type, Family, Model, and Stepping ID (see Figure 3-6). 

EBX Bits 07 - 00: Brand Index. 

Bits 15 - OB: CLFLUSH line size (Value * B = cache line size in bytes; used also by CLFLUSHOPT). 

Bits 23-16: Maximum number of addressable IDs for logical processors in this physical package*. 

Bits 31 - 24: Initial APIC ID. 

ECX Feature Information (see Figure 3-7 and Table 3-10). 

EDX Feature Information (see Figure 3-B and Table 3-11). 

NOTES: 

* The nearest power-of-2 integer that is not smaller than EBX[23:16] is the number of unigue initial APIC 
IDs reserved for addressing different logical processors in a physical package. This field is only valid if 
CPUID.1.EDX.HTT[bit 28]= 1. 

02H 

EAX Cache and TLB Information (see Table 3-12). 

EBX Cache and TLB Information. 

ECX Cache and TLB Information. 

EDX Cache and TLB Information. 

03H 

EAX Reserved. 

EBX Reserved. 

ECX Bits 00-31 of 96 bit processor serial number. (Available in Pentium III processor only; otherwise, the 

value in this register is reserved.) 

EDX Bits 32 - 63 of 96 bit processor serial number. (Available in Pentium III processor only; otherwise, the 

value in this register is reserved.) 

NOTES: 

Processor serial number (PSN) is not supported in the Pentium 4 processor or later. On all models, use 
the PSN flag (returned using CPUID) to check for PSN support before accessing the feature. 

CPUID leaves above 2 and below 80000000H are visible only when IA32_MISC_ENABLE[bit 22] has its default value of 0. 

Deterministic Cache Parameters Leaf 

04H 

NOTES: 

Leaf 04H output depends on the initial value in ECX.* 

See also: "INPUT EAX = 04H: Returns Deterministic Cache Parameters for Each Level" on page 214. 

EAX Bits 04 - 00: Cache Type Field. 

0 = Null - No more caches. 

1 = Data Cache. 

2 = Instruction Cache. 

3 = Unified Cache. 

4-31 = Reserved. 
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Table 3-8. Information Returned by CPUID Instruction (Contd.) 


Initial EAX 
Value 

Information Provided about the Processor 


Bits 07 - 05: Cache Level (starts at 1). 

Bit 08: Self Initializing cache level (does not need SW initialization). 

Bit 09: Fully Associative cache. 

Bits 13-10: Reserved. 

Bits 25-14: Maximum number of addressable IDs for logical processors sharing this cache**, ***. 

Bits 31 - 26: Maximum number of addressable IDs for processor cores in the physical 

EBX Bits 11 - 00: L = System Coherency Line Size**. 

Bits 21 -12: P = Physical Line partitions**. 

Bits 31 - 22: W = Ways of associativity**. 

ECX Bits 31 -00: S = Number of Sets**. 

EDX Bit 00: Write-Back Invalidate/Invalidate. 

0 = WBINVD/INVD from threads sharing this cache acts upon lower level caches for threads sharing this 
cache. 

1 = WBINVD/INVD is not guaranteed to act upon lower level caches of non-originating threads sharing 
this cache. 

Bit 01: Cache Inclusiveness. 

0 = Cache is not inclusive of lower cache levels. 

1 = Cache is inclusive of lower cache levels. 

Bit 02: Complex Cache Indexing. 

0 = Direct mapped cache. 

1 = A complex function is used to index the cache, potentially using all address bits. 

Bits 31 - 03: Reserved = 0. 

NOTES: 

* If ECX contains an invalid sub leaf index, EAX/EBX/ECX/EDX return 0. Sub-leaf index n+1 is invalid if sub¬ 
leaf n returns EAX[4:0] as 0. 

** Add one to the return value to get the result. 

***The nearest power-of-2 integer that is not smaller than (1 + EAX[25:14]) is the number of unique ini¬ 
tial APIC IDs reserved for addressing different logical processors sharing this cache. 

**** The nearest power-of-2 integer that is not smaller than (1 + EAX[31:26]) is the number of unique 
CoreJDs reserved for addressing different processor cores in a physical package. Core ID is a subset of 
bits of the initial APIC ID. 

***** The returned value is constant for valid initial values in ECX. Valid ECX values start from 0. 


MONITOR/MWAIT Leaf 

OSH 

EAX Bits 15-00: Smallest monitor-line size in bytes (default is processor's monitor granularity). 

Bits 31 -16: Reserved = 0. 

EBX Bits 15-00: Largest monitor-line size in bytes (default is processor's monitor granularity). 

Bits 31 -16: Reserved = 0. 

ECX Bit 00: Enumeration of Monitor-Mwait extensions (beyond EAX and EBX registers) supported. 

Bit 01: Supports treating interrupts as break-event for MWAIT, even when interrupts disabled. 

Bits 31 - 02: Reserved. 
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Table 3-8. Information Returned by CPUID Instruction (Contd.) 


Initial EAX 

Value Information Provided about the Processor 

EDX Bits 03 - 00: Number of CO* sub C-states supported using MWAIT. 

Bits 07 - 04: Number of Cl* sub C-states supported using MWAIT. 

Bits 11-08: Number of C2* sub C-states supported using MWAIT. 

Bits 15-12: Number of C3* sub C-states supported using MWAIT. 

Bits 19-16: Number of C4* sub C-states supported using MWAIT. 

Bits 23 - 20: Number of C5* sub C-states supported using MWAIT. 

Bits 27 - 24: Number of C6* sub C-states supported using MWAIT. 

Bits 31 - 28: Number of C7* sub C-states supported using MWAIT. 

NOTE: 

* The definition of CO through C7 states for MWAIT extension are processor-specific C-states, not ACPI C- 
states. 


Thermal and Power Management Leaf 


OSH 


EAX 


EBX 

ECX 

EDX 


Bit 00: Digital temperature sensor is supported if set. 

Bit 01: Intel Turbo Boost Technology Available (see description of IA32_MISC_ENABLE[38]). 

Bit 02: ARAT. APIC-Timer-always-running feature is supported if set. 

Bit 03: Reserved. 

Bit 04: PIN. Power limit notification controls are supported if set. 

Bit 05: ECMD. Clock modulation duty cycle extension is supported if set. 

Bit 06: PTM. Package thermal management is supported if set. 

Bit 07: HWP. HWP base registers (IA32_PM_ENABLE[bit 0], IA32_HWP_CAPABILITIES, 
IA32_HWP_REQUEST, IA32_HWP_STATUS) are supported if set. 

Bit 08: HWP_Notification. IA32_HWPJNTERRUPT MSR is supported if set. 

Bit 09: HWP_Activity_Window. IA32_HWP_REQUEST[bits 41:32] is supported if set. 

Bit 10: HWP_Energy_Performance_Preference. IA32_HWP_REQUEST[bits 31:24] is supported if set. 

Bit 11: HWP_Package_Level_Request. IA32_HWP_REQUEST_PKG MSR is supported if set. 

Bit 12: Reserved. 

Bit 13: HDC. HDC base registers IA32_PKG_HDC_CTL, IA32_PM_CTL1, IA32_THREAD_STALL MSRs are 
supported if set. 

Bits 31 -15: Reserved. 

Bits 03 - 00: Number of Interrupt Thresholds in Digital Thermal Sensor. 

Bits 31 - 04: Reserved. 

Bit 00: Hardware Coordination Feedback Capability (Presence of IA32_MPERF and IA32_APERF). The 
capability to provide a measure of delivered processor performance (since last reset of the counters), as 
a percentage of the expected processor performance when running at the TSC frequency. 

Bits 02 - 01: Reserved = 0. 

Bit 03: The processor supports performance-energy bias preference if CPUID.06H:ECX.SETBH[bit 3] is set 
and it also implies the presence of a new architectural MSR called IA32_ENERGY_PERF_BIAS (1 BOH). 

Bits 31 - 04: Reserved = 0. 

Reserved = 0. 


CPUID-CPU Identification 


Vol.2A 3-193 








INSTRUCTION SET REFERENCE, A-L 


Table 3-8. Information Returned by CPUID Instruction (Contd.) 


Initial EAX 
Value 

Information Provided about the Processor 

Structured Extended Feature Flags Enumeration Leaf (Output depends on ECX input value) 

07H 

Sub-leaf 0 (Input ECX = 0). * 


EAX Bits 31 - 00: Reports the maximum input value for supported leaf 7 sub-leaves. 

EBX Bit 00: FSGSBASE. Supports RDFSBASE/RDGSBASE/WRFSBASE/WRGSBASE If 1. 

Bit 01: IA32_TSC_ADJUST MSR is supported if 1. 

Bit 02: SGX. Supports Intel' Software Guard Extensions (Intel' SGX Extensions) if 1. 

Bit 03:BMI1. 

Bit 04: HLE. 

Bit 05: AVX2. 

Bit 06: FDP_EXCPTN_ONLY. x87 FPU Data Pointer updated only on x87 exceptions if 1. 

Bit 07: SMEP. Supports Supervisor-Mode Execution Prevention if 1. 

Bit 08: BMI2. 

Bit 09: Supports Enhanced REP MOVSB/STOSB if 1. 

Bit 10: INVPCID. If 1, supports INVPCID instruction for system software that manages process-context 
identifiers. 

Bit 11: RTM. 

Bit 12: RDT-M. Supports Intel' Resource Director Technology (Intel' RDT) Monitoring capability if 1. 

Bit 13: Deprecates FPU CS and FPU DS values if 1. 

Bit 14: MPX. Supports Intel' Memory Protection Extensions If 1. 

Bit 15: RDT-A. Supports Intel' Resource Director Technology (Intel' RDT) Allocation capability if 1. 

Bits 17:16: Reserved. 

Bit 18: RDSEED. 

Bit 19: ADX. 

Bit 20: SMAP. Supports Supervisor-Mode Access Prevention (and the CLAC/STAC instructions) if 1. 

Bits 22 - 21: Reserved. 

Bit 23: CLFLUSHOPT. 

Bit 24: CLWB. 

Bit 25: Intel Processor Trace. 

Bits 28 - 26: Reserved. 

Bit 29: SFIA. supports Intel' Secure Flash Algorithm Extensions (Intel' SFIA Extensions) if 1. 

Bits 31 - 30: Reserved. 

ECX Bit00:PREFETCHWT1. 

Bit 01: Reserved. 

Bit 02: UMIP. Supports user-mode instruction prevention if 1. 

Bit 03: PKU. Supports protection keys for user-mode pages if 1. 

Bit 04: OSPKE. If 1, OS has set CR4.PKE to enable protection keys (and the RDPKRU/WRPKRU instruc¬ 
tions). 

Bits 16-5: Reserved. 

Bits 21 -17: The value of MAWAU used by the BNDLDX and BNDSTX instructions in 64-bit mode. 

Bit 22: RDPID. Supports Read Processor ID if 1. 

Bits 29 - 23: Reserved. 

Bit 30: SGX_LC. Supports SGX Launch Configuration if 1. 

Bit 31: Reserved. 

EDX Reserved. 

NOTE: 

* If ECX contains an invalid sub-leaf index, EAX/EBX/ECX/EDX return 0. Sub-leaf index n is invalid if n 
exceeds the value that sub-leaf 0 returns in EAX. 
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Table 3-8. Information Returned by CPUID Instruction (Contd.) 


Initial EAX 
Value 

Information Provided about the Processor 

Direct Cache Access Information Leaf 

09H 

EAX 

Value of bits [31:0] of IA32_PLATFORM_DCA_CAP MSR (address 1F8H). 


EBX 

Reserved. 


ECX 

Reserved. 


EDX 

Reserved. 

Architectural Performance Monitoring Leaf 

OAH 

EAX 

Bits 07 - 00: Version ID of architectural performance monitoring. 

Bits 15-08: Number of general-purpose performance monitoring counter per logical processor. 

Bits 23-16: Bit width of general-purpose, performance monitoring counter. 

Bits 31 - 24: Length of EBX bit vector to enumerate architectural performance monitoring events. 


EBX 

Bit 00: Core cycle event not available if 1. 

Bit 01: Instruction retired event not available if 1. 

Bit 02: Reference cycles event not available if 1. 

Bit 03: Last-level cache reference event not available if 1. 

Bit 04: Last-level cache misses event not available if 1. 

Bit 05: Branch instruction retired event not available if 1. 

Bit 06: Branch mispredict retired event not available if 1. 

Bits 31 - 07: Reserved = 0. 


ECX 

Reserved = 0. 


EDX 

Bits 04 - 00: Number of fixed-function performance counters (if Version ID > 1). 

Bits 12-05: Bit width of fixed-function performance counters (if Version ID > 1). 

Reserved = 0. 

Extended Topology Enumeration Leaf 

OBH 


NOTES: 

Most of Leaf OBFI output depends on the initial value in ECX. 

The EDX output of leaf OBFI is always valid and does not vary with input value in ECX. 

Output value in ECX[7:0] always equals input value in ECX[7:0]. 

For sub-leaves that return an invalid level-type of 0 in ECX[15:8]; EAX and EBX will return 0. 

If an input value n in ECX returns the invalid level-type of 0 in ECX[15:8], other input values with ECX > 
n also return 0 in ECX[15:8]. 


EAX 

Bits 04 - 00: Number of bits to shift right on x2APIC ID to get a unique topology ID of the next level type*. 
All logical processors with the same next level ID share current level. 

Bits 31 - 05: Reserved. 


EBX 

Bits 15-00: Number of logical processors at this level type. The number reflects configuration as shipped 
by Intel**. 

Bits 31-16: Reserved. 


ECX 

Bits 07 - 00: Level number. Same value in ECX input. 

Bits 15-08: Level type***. 

Bits 31 -16: Reserved. 


EDX 

Bits 31-00: X2APIC ID the current logical processor. 

NOTES: 

* Software should use this field (EAX[4:0]) to enumerate processor topology of the system. 
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Table 3-8. Information Returned by CPUID Instruction (Contd.) 


Initial EAX 
Value 

Information Provided about the Processor 


** Software must not use EBX[15:0] to enumerate processor topology of the system. This value in this 
field (EBX[15:0]) is only intended for display/diagnostic purposes. The actual number of logical processors 
available to BIOS/OS/Applications may be different from the value of EBX[15:0], depending on software 
and platform hardware configurations. 

*** The value of the "level type" field is not related to level numbers in any way, higher "level type" val¬ 
ues do not mean higher levels. Level type field has the following encoding: 

0: Invalid. 

1: SMT. 

2: Core. 

3-255: Reserved. 

Processor Extended State Enumeration Main Leaf (EAX = ODH, ECX = 0) 

ODH 

NOTES: 

Leaf ODH main leaf (ECX = 0). 

EAX Bits 31 - 00: Reports the supported bits of the lower 32 bits of XCRO. XCR0[n] can be set to 1 only if 

EAX[n] is 1. 

Bit 00: x87 state. 

Bit 01: SSE state. 

Bit 02: AVX state. 

Bits 04 - 03: MPX state. 

Bits 07-05:AVX-512state. 

Bit 08: Used for IA32_XSS. 

Bit 09: PKRU state. 

Bits 31 -10: Reserved. 

EBX Bits 31 - 00: Maximum size (bytes, from the beginning of the XSAVE/XRSTOR save area) required by 

enabled features in XCRO. May be different than ECX if some features at the end of the XSAVE save area 
are not enabled. 

ECX Bit 31 - 00: Maximum size (bytes, from the beginning of the XSAVE/XRSTOR save area) of the 

XSAVE/XRSTOR save area required by all supported features in the processor, i.e., all the valid bit fields in 
XCRO. 

EDX Bit 31 - 00: Reports the supported bits of the upper 32 bits of XCRO. XCR0[n+32] can be set to 1 only if 

EDX[n] is 1. 

Bits 31 - 00: Reserved. 

Processor Extended State Enumeration Sub-leaf (EAX = ODH, ECX = 7 ) 

ODH 

EAX Bit 00: XSAVEOPT is available. 

Bit 01: Supports XSAVEC and the compacted form of XRSTOR if set. 

Bit 02: Supports XGETBV with ECX = 1 if set. 

Bit 03: Supports XSAVES/XRSTORS and IA32_XSS if set. 

Bits 31 - 04: Reserved. 

EBX Bits 31 - 00: The size in bytes of the XSAVE area containing all states enabled by XCRO | IA32_XSS. 

ECX Bits 31 - 00: Reports the supported bits of the lower 32 bits of the IA32_XSS MSR. IA32_XSS[n] can be 

set to 1 only if ECX[n] is 1. 

Bits 07 - 00: Used for XCRO. 

Bit 08: PT state. 

Bit 09: Used for XCRO. 

Bits 31 -10: Reserved. 

EDX Bits 31 - 00: Reports the supported bits of the upper 32 bits of the IA32_XSS MSR. IA32_XSS[n+32] can 

be set to 1 only if EDX[n] is 1. 

Bits 31 - 00: Reserved. 
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Table 3-8. Information Returned by CPUID Instruction (Contd.) 


Initial EAX 
Value 

Information Provided about the Processor 

Processor Extended State Enumeration Sub-leaves (EAX = ODH, ECX = n, n > 1) 

ODH 

NOTES: 

Leaf ODH output depends on the initial value in ECX. 

Each sub-leaf index (starting at position 2) is supported if it corresponds to a supported bit in either the 
XCRO register or the IA32_XSS MSR. 

* If ECX contains an invalid sub-leaf index, EAX/EBX/ECX/EDX return 0. Sub-leaf n (0 < n < 31) is invalid 
if sub-leaf 0 returns 0 in EAX[n] and sub-leaf 1 returns 0 in ECX[n]. Sub-leaf n (32 < n < 63) is invalid if 
sub-leaf 0 returns 0 in EDX[n-32] and sub-leaf 1 returns 0 in EDX[n-32]. 

EAX Bits 31-0: The size in bytes (from the offset specified in EBX) of the save area for an extended state 

feature associated with a valid sub-leaf index, n. 

EBX Bits 31-0: The offset in bytes of this extended state component's save area from the beginning of the 

XSAVE/XRSTOR area. 

This field reports 0 if the sub-leaf index, n, does not map to a valid bit in the XCRO register*. 

ECX Bit 00 is set if the bit n (corresponding to the sub-leaf index) is supported in the IA32_XSS MSR; it is clear 

if bit n is instead supported in XCRO. 

Bit 01 is set if, when the compacted format of an XSAVE area is used, this extended state component 
located on the next 64-byte boundary following the preceding state component (otherwise, it is located 
immediately following the preceding state component). 

Bits 31-02 are reserved. 

This field reports 0 if the sub-leaf index, n, is invalid*. 

EDX This field reports 0 if the sub-leaf index, n, is invalid*; otherwise it is reserved. 

Intel Resource Director Technology (Intel RDT) Monitoring Enumeration Sub-leaf (EAX = OFH, ECX = 0) 

OFH 

NOTES: 

Leaf OFH output depends on the initial value in ECX. 

Sub-leaf index 0 reports valid resource type starting at bit position 1 of EDX. 

EAX Reserved. 

EBX Bits 31 - 00: Maximum range (zero-based) of RMID within this physical processor of all types. 

ECX Reserved. 

EDX Bit 00: Reserved. 

Bit 01: Supports L3 Cache Intel RDT Monitoring if 1. 

Bits 31 - 02: Reserved. 

L3 Cache Intel RDT Monitoring Capability Enumeration Sub-leaf (EAX = OFH, ECX = 1) 

OFH 

NOTES: 

Leaf OFH output depends on the initial value in ECX. 

EAX Reserved. 

EBX Bits 31 - 00: Conversion factor from reported IA32_QM_CTR value to occupancy metric (bytes). 

ECX Maximum range (zero-based) of RMID of this resource type. 

EDX Bit 00: Supports L3 occupancy monitoring if 1. 

Bit 01: Supports L3 Total Bandwidth monitoring if 1. 

Bit 02: Supports L3 Local Bandwidth monitoring if 1. 

Bits 31 - 03: Reserved. 
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Table 3-8. Information Returned by CPUID Instruction (Contd.) 


Initial EAX 
Value 

Information Provided about the Processor 

Intel Resource Director Technology (Intel RDT) Allocation Enumeration Sub-leaf (EAX = 1 OH, ECX = 0) 

10H 

EAX 

NOTES: 

Leaf 10H output depends on the initial value in ECX. 

Sub-leaf index 0 reports valid resource identification (ResID) starting at bit position 1 of EBX. 

Reserved. 


EBX 

Bit 00: Reserved. 

Bit 01: Supports L3 Cache Allocation Technology if 1. 

Bit 02: Supports L2 Cache Allocation Technology if 1. 

Bits 31 - 03: Reserved. 


ECX 

Reserved. 


EDX 

Reserved. 

L3 Cache Allocation Technology Enumeration Sub-leaf (EAX = 1 OH, ECX = ResID = 1) 

10H 


NOTES: 

Leaf 10H output depends on the initial value in ECX. 


EAX 

Bits 4 - 00: Length of the capacity bit mask for the corresponding ResID using minus-one notation. 

Bits 31 - 05: Reserved. 


EBX 

Bits 31 - 00: Bit-granular map of isolation/contention of allocation units. 


ECX 

Bit 00: Reserved. 

Bit 01: Updates of COS should be infrequent if 1. 

Bit 02: Code and Data Prioritization Technology supported if 1. 

Bits 31 - 03: Reserved. 


EDX 

Bits 15-00: Highest COS number supported for this ResID. 

Bits 31 -16: Reserved. 

L2 Cache Allocation Technology Enumeration Sub-leaf (EAX = 10H, ECX = ResID =2) 

10H 


NOTES: 

Leaf 10H output depends on the initial value in ECX. 


EAX 

Bits 4 - 00: Length of the capacity bit mask for the corresponding ResID using minus-one notation. 

Bits 31 - 05: Reserved. 


EBX 

Bits 31 - 00: Bit-granular map of isolation/contention of allocation units. 


ECX 

Bits 31 - 00: Reserved. 


EDX 

Bits 15-00: Highest COS number supported for this ResID. 

Bits 31 -16: Reserved. 

Intel SOX Capability Enumeration Leaf sub-leaf 0 (EAX = 12H, ECX = 0) 

12H 


NOTES: 

Leaf 12H sub-leaf 0 (ECX = 0) is supported if CPUID.(EAX=07H, ECX=OH):EBX[SGX] = 1. 


EAX 

Bit 00: SGX1. If 1, Indicates Intel SGX supports the collection of SGX1 leaf functions. 

Bit 01: SGX2. If 1, Indicates Intel SGX supports the collection of SGX2 leaf functions. 

Bit 31 - 02: Reserved. 


EBX 

Bit 31 - 00: MISCSELECT. Bit vector of supported extended SGX features. 


ECX 

Bit 31 - 00: Reserved. 
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Table 3-8. Information Returned by CPUID Instruction (Contd.) 


Initial EAX 
Value 

Information Provided about the Processor 


EDX Bit 07 - 00: MaxEnclaveSize_Not64. The maximum supported enclave size in non-64-bit mode is 

2''(EDX[7:0]). 

Bit 15 - 08: MaxEnclaveSize_64. The maximum supported enclave size in 64-bit mode is 2"(EDX[15:8]). 
Bits 31 -16: Reserved. 

Intel SGX Attributes Enumeration Leaf, sub-leaf 1 (EAX = 12H, ECX = 1) 

12H 

NOTES: 

Leaf 12H sub-leaf 1 (ECX = 1) is supported if CPUID.(EAX=07H, ECX=OH):EBX[SGX] = 1. 

EAX Bit 31 - 00: Reports the valid bits of SECS.ATTRIBUTES[31:0] that software can set with ECREATE. 

EBX Bit 31 - 00: Reports the valid bits of SECS.ATTRIBUTES[63:32] that software can set with ECREATE. 

ECX Bit 31 - 00: Reports the valid bits of SECS.ATTRIBUTES[95:64] that software can set with ECREATE. 

EDX Bit 31 - 00: Reports the valid bits of SECS.ATTRIBUTES[127:96] that software can set with ECREATE. 

Intel SGX EPC Enumeration Leaf sub-leaves (EAX = 12H, ECX = 2 or higher) 

12H 

NOTES: 

Leaf 12H sub-leaf 2 or higher (ECX >= 2) is supported if CPUID.(EAX=07H, ECX=OH):EBX[SGX] = 1. 

For sub-leaves (ECX = 2 or higher), definition of EDX,ECX,EBX,EAX[31:4] depends on the sub-leaf type 
listed below. 

EAX Bit 03 - 00: Sub-leaf Type 

0000b: Indicates this sub-leaf is invalid. 

0001 b: This sub-leaf enumerates an EPC section. EBX:EAX and EDX:ECX provide information on the 
Enclave Page Cache (EPC) section. 

All other type encodings are reserved. 

Type 0000b. This sub-leaf is invalid. 

EDX:ECX:EBX:EAX return 0. 

Type 0001 b. This sub-leaf enumerates an EPC sections with EDX:ECX, EBX:EAX defined as follows. 

EAX[11:04]: Reserved (enumerate 0). 

EAX[31:12]: Bits 31:12 of the physical address of the base of the EPC section. 

EBX[19:00]: Bits 51:32 of the physical address of the base of the EPC section. 

EBX[31:20]: Reserved. 

ECX[03:00]: EPC section property encoding defined as follows: 

If EAX[3:0] 0000b, then all bits of the EDX:ECX pair are enumerated as 0. 

If EAX[3:0] 0001b, then this section has confidentiality and integrity protection. 

All other encodings are reserved. 

ECX[11:04]: Reserved (enumerate 0). 

ECX[31:12]: Bits 31:12 of the size of the corresponding EPC section within the Processor Reserved 
Memory. 

EDX[19:00]: Bits 51:32 of the size of the corresponding EPC section within the Processor Reserved 
Memory. 

EDX[31:20]: Reserved. 
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Table 3-8. Information Returned by CPUID Instruction (Contd.) 


Initial EAX 
Value 

Information Provided about the Processor 

Intel Processor Trace Enumeration Main Leaf (EAX = 14H, ECX = 0) 

14H 

NOTES: 

Leaf 14H main leaf (ECX = 0). 

EAX Bits 31 - 00: Reports the maximum sub-leaf supported in leaf 14H. 

EBX Bit 00: If 1, indicates that IA32_RTIT_CTL.CR3Filter can be set to 1, and that IA32_RTIT_CR3_MATCH 

MSR can be accessed. 

Bit 01: If 1, indicates support of Configurable PSB and Cycle-Accurate Mode. 

Bit 02: If 1, indicates support of IP Filtering, TraceStop filtering, and preservation of Intel PT MSRs across 
warm reset. 

Bit 03: If 1, indicates support of MTC timing packet and suppression of COFI-based packets. 

Bit 04: If 1, indicates support of PTWRITE. Writes can set IA32_RTIT_CTL[12] (PTWEn) and 
IA32_RTIT_CTL[5] (FUPonPTW), and PTWRITE can generate packets. 

Bit 05: If 1, indicates support of Power Event Trace. Writes can set IA32_RTIT_CTL[4] (PwrEvtEn), 
enabling Power Event Trace packet generation. 

Bit 31 - 06: Reserved. 

ECX Bit 00: If 1, Tracing can be enabled with IA32_RTIT_CTL.ToPA = 1, hence utilizing the ToPA output 

scheme; IA32_RTIT_OUTPUT_BASE and IA32_RTIT_OUTPUT_MASK_PTRS MSRs can be accessed. 

Bit 01: If 1, ToPA tables can hold any number of output entries, up to the maximum allowed by the Mas- 
kOrTableOffset field of IA32_RTIT_0UTPUT_MASK_PTRS. 

Bit 02: If 1, indicates support of Single-Range Output scheme. 

Bit 03: If 1, indicates support of output to Trace Transport subsystem. 

Bit 30 - 04: Reserved. 

Bit 31: If 1, generated packets which contain IP payloads have LIP values, which include the CS base com¬ 
ponent. 

EDX Bits 31 - 00: Reserved. 

Intel Processor Trace Enumeration Sub-leaf (EAX = 14H, ECX = 1) 

14H 

EAX Bits 02 - 00: Number of configurable Address Ranges for filtering. 

Bits 15-03: Reserved. 

Bits 31 -16: Bitmap of supported MTC period encodings. 

EBX Bits 15-00: Bitmap of supported Cycle Threshold value encodings. 

Bit 31 -16: Bitmap of supported Configurable PSB freguency encodings. 

ECX Bits 31 - 00: Reserved. 

EDX Bits 31 - 00: Reserved. 

Time Stamp Counter and Nominal Core Crystal Clock Information Leaf 

15H 

NOTES: 

If EBX[31:0] is 0, the TSC/"core crystal clock" ratio is not enumerated. 

EBX[31:0]/EAX[31:0] indicates the ratio of the TSC frequency and the core crystal clock frequency. 

If ECX is 0, the nominal core crystal clock frequency is not enumerated. 

"TSC frequency" = "core crystal clock frequency" * EBX/EAX. 

The core crystal clock may differ from the reference clock, bus clock, or core clock frequencies. 

EAX Bits 31 - 00: An unsigned integer which is the denominator of the TSC/"core crystal clock" ratio. 

EBX Bits 31 - 00: An unsigned integer which is the numerator of the TSC/"core crystal clock" ratio. 

ECX Bits 31 - 00: An unsigned integer which is the nominal frequency of the core crystal clock in FIz. 

EDX Bits 31 - 00: Reserved = 0. 
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Table 3-8. Information Returned by CPUID Instruction (Contd.) 


Initial EAX 
Value 

Information Provided about the Processor 

Processor Frequency Information Leaf 

16H 

EAX Bits 15-00: Processor Base Frequency (in MFIz). 

Bits 31 -16: Reserved =0. 

EBX Bits 15-00: Maximum Frequency (in MFIz). 

Bits 31 -16: Reserved = 0. 

ECX Bits 15-00: Bus (Reference) Frequency (in MFIz). 

Bits 31 -16: Reserved = 0. 

EDX Reserved. 

NOTES: 

* Data is returned from this interface in accordance with the processor's specification and does not reflect 
actual values. Suitable use of this data includes the display of processor information in like manner to the 
processor brand string and for determining the appropriate range to use when displaying processor 
information e.g. frequency history graphs. The returned information should not be used for any other 
purpose as the returned information does not accurately correlate to information / counters returned by 
other processor interfaces. 

While a processor may support the Processor Frequency Information leaf, fields that return a value of 
zero are not supported. 

System-On-Chip Vendor Attribute Enumeration Main Leaf (EAX = i7H, ECX = 0) 

17H 

NOTES: 

Leaf 17H main leaf (ECX = 0). 

Leaf 17FI output depends on the initial value in ECX. 

Leaf 17FI sub-leaves 1 through 3 reports SOC Vendor Brand String. 

Leaf 17FI is valid if MaxSOCIDJndex >= 3. 

Leaf 17FI sub-leaves 4 and above are reserved. 

EAX Bits 31 - 00: MaxSOCIDJndex. Reports the maximum input value of supported sub-leaf in leaf 17FI. 

EBX Bits 15-00: SOC Vendor ID. 

Bit 16: IsVendorScheme. If 1, the SOC Vendor ID field is assigned via an industry standard enumeration 
scheme. Otherwise, the SOC Vendor ID field is assigned by Intel. 

Bits 31 -17: Reserved = 0. 

ECX Bits 31 - 00: Project ID. A unique number an SOC vendor assigns to its SOC projects. 

EDX Bits 31 - 00: Stepping ID. A unique number within an SOC project that an SOC vendor assigns. 

System-On-Chip Vendor Attribute Enumeration Sub-leaf (EAX = 17H, ECX = 1.3) 

17H 

EAX Bit 31 - 00: SOC Vendor Brand String. UTF-8 encoded string. 

EBX Bit 31 - 00: SOC Vendor Brand String. UTF-8 encoded string. 

ECX Bit 31 - 00: SOC Vendor Brand String. UTF-8 encoded string. 

EDX Bit 31 - 00: SOC Vendor Brand String. UTF-8 encoded string. 

NOTES: 

Leaf 17FI output depends on the initial value in ECX. 

SOC Vendor Brand String is a UTF-8 encoded string padded with trailing bytes of OOFI. 

The complete SOC Vendor Brand String is constructed by concatenating in ascending order of 
EAX:EBX:ECX:EDX and from the sub-leaf 1 fragment towards sub-leaf 3. 
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Table 3-8. Information Returned by CPUID Instruction (Contd.) 


Initial EAX 
Value 


Information Provided about the Processor 


System-On-Chip Vendor Attribute Enumeration Sub-ieaves (EAX = 17H, ECX > MaxSOCiDJndex) 

17H 


NOTES: 

Leaf 17FI output depends on the initial value in ECX. 


EAX 

Bits 31 - 00: Reserved = 0. 


EBX 

Bits 31 - 00: Reserved = 0. 


ECX 

Bits 31 - 00: Reserved = 0. 


EDX 

Bits 31 - 00: Reserved = 0. 


Unimpiemented CPUiD Leaf Functions 

40000000H 


Invalid. No existing or future CPU will return processor identification or feature information if the initial 

4FFFFFFFH 


EAX value is in the range 40000000H to 4FFFFFFFH. 


Extended Function CPUiD information 

80000000H 

EAX 

Maximum Input Value for Extended Function CPUID Information. 


EBX 

Reserved. 


ECX 

Reserved. 


EDX 

Reserved. 

80000001H 

EAX 

Extended Processor Signature and Feature Bits. 


EBX 

Reserved. 


ECX 

Bit 00: LAFIF/SAFIF available in 64-bit mode. 

Bits 04 - 01: Reserved. 

Bit 05: LZCNT. 

Bits 07 - 06: Reserved. 

Bit 08: PREFETCHW. 

Bits 31 - 09: Reserved. 


EDX 

Bits 10-00: Reserved. 

Bit 11: SYSCALL/SYSRET available in 64-bit mode. 

Bits 19-12: Reserved = 0. 

Bit 20: Execute Disable Bit available. 

Bits 25 - 21: Reserved = 0. 

Bit 26:1 -GByte pages are available if 1. 

Bit 27: RDTSCP and IA32_TSC_AUX are available if 1. 

Bit 28: Reserved = 0. 

Bit 29: Intel® 64 Architecture available if 1. 

Bits 31 - 30: Reserved = 0. 

80000002H 

EAX 

Processor Brand String. 


EBX 

Processor Brand String Continued. 


ECX 

Processor Brand String Continued. 


EDX 

Processor Brand String Continued. 

80000003H 

EAX 

Processor Brand String Continued. 


EBX 

Processor Brand String Continued. 


ECX 

Processor Brand String Continued. 


EDX 

Processor Brand String Continued. 
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Table 3-8. Information Returned by CPUID Instruction (Contd.) 


Initial EAX 
Value 


Information Provided about the Processor 

80000004H 

EAX 

Processor Brand String Continued. 


EBX 

Processor Brand String Continued. 


ECX 

Processor Brand String Continued. 


EDX 

Processor Brand String Continued. 

80000005H 

EAX 

Reserved = 0. 


EBX 

Reserved = 0. 


ECX 

Reserved = 0. 


EDX 

Reserved = 0. 

80000006H 

EAX 

Reserved = 0. 


EBX 

Reserved = 0. 


ECX 

Bits 07 - 00: Cache Line size in bytes. 

Bits 11-08: Reserved. 

Bits 15 -12: L2 Associativity field *. 

Bits 31 -16: Cache size in 1K units. 


EDX 

Reserved = 0. 

NOTES: 

* L2 associativity field encodings: 

OOH - Disabled. 

01H - Direct mapped. 

02H - 2-way. 

04H - 4-way. 

06H - 8-way. 

08H -16-way. 

OFH - Fully associative. 

80000007H 

EAX 

Reserved = 0. 


EBX 

Reserved = 0. 


ECX 

Reserved = 0. 


EDX 

Bits 07 - 00: Reserved = 0. 

Bit 08: Invariant TSC available if 1. 

Bits 31 - 09: Reserved = 0. 

80000008H 

EAX 

Linear/Physical Address size. 

Bits 07 - 00: #Physical Address Bits*. 

Bits 15-08: #Linear Address Bits. 

Bits 31 -16: Reserved = 0. 


EBX 

Reserved = 0. 


ECX 

Reserved = 0. 


EDX 

Reserved = 0. 

NOTES: 

* If CPUID.80000008FI:EAX[7:0] is supported, the maximum physical address number supported should 
come from this field. 


INPUT EAX = 0: Returns CPUID's Highest Value for Basic Processor Information and the Vendor Identification String 

When CPUID executes with EAX set to 0, the processor returns the highest value the CPUID recognizes for 
returning basic processor information. The value is returned in the EAX register and is processor specific. 
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A vendor identification string is also returned in EBX, EDX, and ECX. For Intel processors, the string is "Genuin- 
elntel" and is expressed: 

EBX <- 756e6547h (* "Genu", with G in the low eight bits of BL *) 

EDX <- 49656e69h (* "inel", with i in the low eight bits of DL *) 

ECX 6c65746eh (* "ntel", with n In the low eight bits of CL *) 

INPUT EAX = 80000000H: Returns CPUID's Highest Value for Extended Processor Information 

When CPUID executes with EAX set to 80000000FI, the processor returns the highest value the processor recog¬ 
nizes for returning extended processor information. The value is returned in the EAX register and is processor 
specific. 

IA32_BIOS_SIGNJD Returns Microcode Update Signature 

For processors that support the microcode update facility, the IA32_BIOS_SIGN_ID MSR is loaded with the update 
signature whenever CPUID executes. The signature is returned in the upper DWORD. For details, see Chapter 9 in 
the Intel® 64 and IA-32 Architectures Software Developer's Manual, Volume 3A. 

INPUT EAX = 01H: Returns Model, Family, Stepping Information 

When CPUID executes with EAX set to OlFI, version information is returned in EAX (see Figure 3-6). For example: 
model, family, and processor type for the Intel Xeon processor 5100 series is as follows: 

• Model-llllB 

• Family-OIOIB 

• Processor Type — OOB 

See Table 3-9 for available processor type values. Stepping IDs are provided as needed. 


31 28 27 20 19 16 15 14 13 12 11 8 7 4 3 0 



Extended 

Extended 



Family 

Model 

Stepping 


Family ID 

Model ID 



ID 

ID 


Extended Family ID (0) _ 

Extended Model ID (0) - 

Processor Type - 

Family (OFH for the Pentium 4 Processor Family) - 

Model - 

Reserved 

OM16525 

Figure 3-6. Version Information Returned by CPUID in EAX 
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Table 3-9. Processor Type Field 


Type 

Encoding 

Original OEM Processor 

DOB 

Intel OverDrive" Processor 

DIB 

Dual processor (not applicable to Intel486 processors) 

10B 

Intel reserved 

11B 


NOTE 

See Chapter 19 in the Intel® 64 and IA-32 Architectures Software Developer's Manual, Volume 1, 
for information on identifying earlier IA-32 processors. 

The Extended Family ID needs to be examined only when the Family ID is OFFI. Integrate the fields into a display 
using the following rule: 

IF FamilyJD OFH 

TFIEN DisplayFamily = FamilyJD; 

ELSE DisplayFamily = Extended_FamilyJD + FamilyJD; 

(* Right justify and zero-extend 4-bit field. *) 

FI; 

(* Show DisplayFamily as FIEX field. *) 

The Extended Model ID needs to be examined only when the Family ID is 06FI or OFFI. Integrate the field into a 
display using the following rule: 

IF (FamilyJD = 06H or FamilyJD = OFH) 

THEN DisplayModel = (Extended_ModelJD « 4) + ModelJD; 

(* Right Justify and zero-extend 4-bit field; display ModelJD as HEX field.*) 

ELSE DisplayModel = ModelJD; 

FI; 

(* Show DisplayModel as HEX field. *) 

INPUT EAX = 01H: Returns Additional Information in EBX 

When CPUID executes with EAX set to OlH, additional information is returned to the EBX register: 

• Brand index (low byte of EBX) — this number provides an entry into a brand string table that contains brand 
strings for IA-32 processors. More information about this field is provided later in this section. 

• CLFLUSH instruction cache line size (second byte of EBX) — this number indicates the size of the cache line 
flushed by the CLFLUSH and CLFLUSHOPT instructions in 8-byte increments. This field was introduced in the 
Pentium 4 processor. 

• Local APIC ID (high byte of EBX) — this number is the 8-bit ID that is assigned to the local APIC on the 
processor during power up. This field was introduced in the Pentium 4 processor. 

INPUT EAX = 01H: Returns Feature Information in ECX and EDX 

When CPUID executes with EAX set to OlH, feature information is returned in ECX and EDX. 

• Figure 3-7 and Table 3-10 show encodings for ECX. 

• Figure 3-8 and Table 3-11 show encodings for EDX. 

For all feature flags, a 1 indicates that the feature is supported. Use Intel to properly interpret feature flags. 

NOTE 

Software must confirm that a processor feature is present using feature flags returned by CPUID 
prior to using the feature. Software should not depend on future offerings retaining all features. 
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31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 


ECX 


RDRAND - 

F16C - 

AVX - 

OSXSAVE - 

XSAVE - 

AES - 

TSC-Deadline - 

POPCNT- 

MOVBE - 

X2APIC - 

SSE4_2 — SSE4.2 
SSE4 1 — SSE4.1 


DCA — Direct Cache Access - 

PCID — Process-context Identifiers — 
PDCM — Perf/Debug Capability MSR 

xTPR Update Controi - 

CMPXCHG16B - 

FMA— Fused Muitipiy Add - 

SDBG - 


CNXT-iD —LI Context iD - 

SSSE3 — SSSE3 Extensions - 

TM2 — Thermai Monitor 2- 

EiST— Enhanced Intei SpeedStep® Technoiogy- 

SMX — Safer Mode Extensions - 

VMX — Virtuai Machine Extensions- 

DS-CPL — CPL Quaiified Debug Store- 

MONiTOR — MONiTOR/MWAiT- 

DTES64 — 64-bit DS Area - 

PCLMULQDQ — Carryiess Muitipiication- 

SSE3 — SSE3 Extensions - 


Reserved 


Figure 3-7. Feature Information Returned in the ECX Register 


Table 3-10. Feature Information Returned in the ECX Register 


Bit# 

Mnemonic 

Description 

0 

SSE3 

Streaming SIMD Extensions 3 (SSE3). A value of 1 Indicates the processor supports this 
technology. 

1 

PCLMULQDQ 

PCLMULQDQ. A value of 1 Indicates the processor supports the PCLMULQDQ instruction. 

2 

DTES64 

64-bit DS Area. A value of 1 indicates the processor supports DS area using 64-bit layout. 

3 

MQNITOR 

MONITOR/MWAIT. A value of 1 indicates the processor supports this feature. 

4 

DS-CPL 

CPL Qualified Debug Store. A value of 1 indicates the processor supports the extensions to the 
Debug Store feature to allow for branch message storage qualified by CPL. 

5 

VMX 

Virtual Machine Extensions. A value of 1 indicates that the processor supports this technology. 

6 

SMX 

Safer Mode Extensions. A value of 1 indicates that the processor supports this technology. See 
Chapter 6, "Safer Mode Extensions Reference". 

7 

EIST 

Enhanced Intel SpeedStep® technology. A value of 1 indicates that the processor supports this 
technology. 

8 

TM2 

Thermal Monitor 2. A value of 1 indicates whether the processor supports this technology. 

9 

SSSE3 

A value of 1 indicates the presence of the Supplemental Streaming SIMD Extensions 3 (SSSE3). A 
value of 0 indicates the instruction extensions are not present in the processor. 
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Table 3-10. Feature Information Returned in the ECX Register (Contd.) 


Bit# 

Mnemonic 

Description 

10 

CNXT-ID 

LI Context ID. A value of 1 indicates the LI data cache mode can be set to either adaptive mode 
or shared mode. A value of 0 indicates this feature is not supported. See definition of the 
IA32_MISC_ENABLE MSR Bit 24 (LI Data Cache Context Mode) for details. 

11 

SDBC 

A value of 1 indicates the processor supports IA32_DEBUC_INTERFACE MSR for silicon debug. 

12 

FMA 

A value of 1 indicates the processor supports FMA extensions using VMM state. 

13 

CMPXCHG16B 

CMPXCHG16B Avaiiabie. A value of 1 indicates that the feature is available. See the 
"CMPXCFIG8B/CMPXCFIG16B—Compare and Exchange Bytes" section in this chapter for a 
description. 

14 

xTPR Update 
Control 

xTPR Update Controi. A value of 1 indicates that the processor supports changing 
IA32_MISC_ENABLE[bit 23]. 

15 

PDCM 

Perfmon and Debug Capability: A value of 1 indicates the processor supports the performance 
and debug feature indication MSR IA32_PERF_CAPABILITIES. 

16 

Reserved 

Reserved 

17 

PCID 

Process-context identifiers. A value of 1 indicates that the processor supports PCIDs and that 
software may set CR4.PCIDE to 1. 

18 

DCA 

A value of 1 indicates the processor supports the ability to prefetch data from a memory mapped 
device. 

19 

SSE4.1 

A value of 1 indicates that the processor supports SSE4.1. 

20 

SSE4.2 

A value of 1 indicates that the processor supports SSE4.2. 

21 

X2APIC 

A value of 1 indicates that the processor supports x2APIC feature. 

22 

MOVBE 

A value of 1 indicates that the processor supports MOVBE instruction. 

23 

POPCNT 

A value of 1 indicates that the processor supports the POPCNT instruction. 

24 

TSC-Deadline 

A value of 1 indicates that the processor's local APIC timer supports one-shot operation using a 

TSC deadline value. 

25 

AESNI 

A value of 1 indicates that the processor supports the AESNI instruction extensions. 

26 

XSAVE 

A value of 1 indicates that the processor supports the XSAVE/XRSTOR processor extended states 
feature, the XSETBV/XGETBV instructions, and XCRO. 

27 

OSXSAVE 

A value of 1 indicates that the OS has set CR4.0SXSAVE[bit 18] to enable XSETBV/XGETBV 
instructions to access XCRO and to support processor extended state management using 
XSAVE/XRSTOR. 

28 

AVX 

A value of 1 indicates the processor supports the AVX instruction extensions. 

29 

F16C 

A value of 1 indicates that processor supports 16-bit floating-point conversion instructions. 

30 

RDRAND 

A value of 1 indicates that processor supports RDRAND instruction. 

31 

Not Used 

Always returns 0. 
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31 3029282726252423222120191817161514131211 10 9 87654321 0 


EDX 


PBE-Pend. Brk. EN J 

TM-Therm. Monitor- 

HTT-Multi-threading - 

SS-Self Snoop- 

SSE2-SSE2 Extensions- 

SSE-SSE Extensions- 

FXSR-FXSAVE/FXRSTOR- 

MMX-MMX Technology- 

ACPI-Thermal Monitor and Clock Ctrl- 

DS-Debug Store- 

CLFSH-CLFLUSH instruction- 

PSN-Processor Serial Number- 

PSE-36 - Page Size Extension - 

PAT-Page Attribute Table- 

CMOV-Conditional Move/Compare Instruction 

MCA-Machine Check Architecture- 

PGE-PTE Global Bit- 

MTRR-Memory Type Range Registers- 

SEP-SYSENTER and SYSEXIT- 

APIC-APIC on Chip- 

CX8-CMPXCHG8B Inst.- 

MCE-Machine Check Exception- 

PAE-Physical Address Extensions- 

MSR-RDMSR and WRMSR Support- 

TSC-Time Stamp Counter- 

PSE-Page Size Extensions- 

DE-Debugging Extensions- 

VME-Virtuai-8086 Mode Enhancement- 

FPU-X87 FPU on Chip- 


Reserved 


OM16523 


Figure 3-8. Feature Information Returned in the EDX Register 
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Table 3-11. More on Feature Information Returned in the EDX Register 


Bit# 

Mnemonic 

Description 

0 

FPU 

Fioating Point Unit On-Chip. The processor contains an x87 FPU. 

1 

VME 

Virtuai 8086 Mode Enhancements. Virtual 8086 mode enhancements, including CR4.VME for controlling the 
feature, CR4.PVI for protected mode virtual interrupts, software interrupt indirection, expansion of the TSS 
with the software indirection bitmap, and EFLAGS.VIF and EFLAGS.VIP flags. 

2 

DE 

Debugging Extensions. Support for I/O breakpoints, including CR4.DE for controlling the feature, and optional 
trapping of accesses to DR4 and DR5. 

3 

PSE 

Page Size Extension. Large pages of size 4 MByte are supported, including CR4.PSE for controlling the 
feature, the defined dirty bit in PDE (Page Directory Entries), optional reserved bit trapping in CR3, PDEs, and 
PTEs. 

4 

TSC 

Time Stamp Counter. The RDTSC instruction is supported, including CR4.TSD for controlling privilege. 

5 

MSR 

Model Specific Registers RDMSR and WRMSR Instructions. The RDMSR and WRMSR instructions are 
supported. Some of the MSRs are implementation dependent. 

6 

PAE 

Physical Address Extension. Physical addresses greater than 32 bits are supported: extended page table 
entry formats, an extra level in the page translation tables is defined, 2-MByte pages are supported instead of 

4 Mbyte pages if PAE bit is 1. 

7 

MCE 

Machine Check Exception. Exception 18 is defined for Machine Checks, including CR4.MCE for controlling the 
feature. This feature does not define the model-specific implementations of machine-check error logging, 
reporting, and processor shutdowns. Machine Check exception handlers may have to depend on processor 
version to do model specific processing of the exception, or test for the presence of the Machine Check feature. 

8 

CX8 

CMPXCHG8B Instruction. The compare-and-exchange 8 bytes (64 bits) instruction is supported (implicitly 
locked and atomic). 

9 

APIC 

APIC On-Chip. The processor contains an Advanced Programmable Interrupt Controller (APIC), responding to 
memory mapped commands in the physical address range FFFEOOOOFI to FFFEOFFFFI (by default - some 
processors permit the APIC to be relocated). 

10 

Reserved 

Reserved 

11 

SEP 

SYSENTER and SYSEXIT Instructions. The SYSENTER and SYSEXIT and associated MSRs are supported. 

12 

MTRR 

Memory Type Range Registers. MTRRs are supported. The MTRRcap MSR contains feature bits that describe 
what memory types are supported, how many variable MTRRs are supported, and whether fixed MTRRs are 
supported. 

13 

PGE 

Page Global Bit. The global bit is supported in paging-structure entries that map a page, indicating TLB entries 
that are common to different processes and need not be flushed. The CR4.PGE bit controls this feature. 

14 

MCA 

Machine Check Architecture. A value of 1 indicates the Machine Check Architecture of reporting machine 
errors is supported. The MCG_CAP MSR contains feature bits describing how many banks of error reporting 
MSRs are supported. 

15 

CMOV 

Conditional Move Instructions. The conditional move instruction CMOV is supported. In addition, if x87 FPU is 
present as indicated by the CPUID.FPU feature bit, then the FCOMI and FCMOV instructions are supported 

16 

PAT 

Page Attribute Table. Page Attribute Table is supported. This feature augments the Memory Type Range 
Registers (MTRRs), allowing an operating system to specify attributes of memory accessed through a linear 
address on a 4KB granularity. 

17 

PSE-36 

36-Bit Page Size Extension. 4-MByte pages addressing physical memory beyond 4 GBytes are supported with 
32-bit paging. This feature indicates that upper bits of the physical address of a 4-MByte page are encoded in 
bits 20:13 of the page-directory entry. Such physical addresses are limited by MAXPFIYADDR and may be up to 
40 bits in size. 

18 

PSN 

Processor Serial Number. The processor supports the 96-bit processor identification number feature and the 
feature is enabled. 

19 

CLFSH 

CLFLUSH Instruction. CLFLUSFI Instruction is supported. 

20 

Reserved 

Reserved 
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Table 3-11. More on Feature Information Returned in the EDX Register (Contd.) 


Bit# 

Mnemonic 

Description 

21 

DS 

Debug Store. The processor supports the ability to write debug information into a memory resident buffer. 

This feature is used by the branch trace store (BTS) and precise event-based sampling (PEBS) facilities (see 
Chapter 23, 'Introduction to Virtual-Machine Extensions," in the Intel” 64 and IA-32 Architectures Software 
Developer's Manual, Volume 3Q. 

22 

ACPI 

Thermal Monitor and Software Controiled Ciock Faciiities. The processor implements internal MSRs that 
allow processor temperature to be monitored and processor performance to be modulated in predefined duty 
cycles under software control. 

23 

MMX 

Intei MMX Technology. The processor supports the Intel MMX technology. 

24 

FXSR 

FXSAVE and FXRSTOR Instructions. The FXSAVE and FXRSTOR instructions are supported for fast save and 
restore of the floating point context. Presence of this bit also indicates that CR4.0SFXSR is available for an 
operating system to indicate that it supports the FXSAVE and FXRSTOR instructions. 

25 

SSE 

SSE. The processor supports the SSE extensions. 

26 

SSE2 

SSE2. The processor supports the SSE2 extensions. 

27 

SS 

Seif Snoop. The processor supports the management of conflicting memory types by performing a snoop of its 
own cache structure for transactions issued to the bus. 

28 

HTT 

Max APIC IDs reserved field is Vaiid. A value of 0 for FITT indicates there is only a single logical processor in 
the package and software should assume only a single APIC ID is reserved. A value of 1 for HTT indicates the 
value in CPUID.1 .EBX[23:16] (the Maximum number of addressable IDs for logical processors in this package) is 
valid for the package. 

29 

TM 

Thermal Monitor. The processor implements the thermal monitor automatic thermal control circuitry (TCC). 

30 

Reserved 

Reserved 

31 

PBE 

Pending Break Enable. The processor supports the use of the FERR#/PBE# pin when the processor is in the 
stop-clock state (STPCLK# is asserted) to signal the processor that an interrupt is pending and that the 
processor should return to normal operation to handle the interrupt. Bit 10 (PBE enable) in the 
IA32_MISC_ENABLE MSR enables this capability. 


INPUT EAX = 02H: TLB/Cache/Prefetch Information Returned in EAX, EBX, ECX, EDX 

When CPUID executes with EAX set to 02H, the processor returns information about the processor's internal TLBs, 
cache and prefetch hardware in the EAX, EBX, ECX, and EDX registers. The information is reported in encoded form 
and fall into the following categories: 

• The least-significant byte in register EAX (register AL) will always return OlH. Software should ignore this value 
and not interpret it as an informational descriptor. 

• The most significant bit (bit 31) of each register indicates whether the register contains valid information (set 
to 0) or is reserved (set to 1). 

• If a register contains valid information, the information is contained in 1 byte descriptors. There are four types 
of encoding values for the byte descriptor, the encoding type is noted in the second column of Table 3-12. Table 
3-12 lists the encoding of these descriptors. Note that the order of descriptors in the EAX, EBX, ECX, and EDX 
registers is not defined; that is, specific bytes are not designated to contain descriptors for specific cache, 
prefetch, or TLB types. The descriptors may appear in any order. Note also a processor may report a general 
descriptor type (FFH) and not report any byte descriptor of "cache type" via CPUID leaf 2. 
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Table 3-12. Encoding of CPUID Leaf 2 Descriptors 


Value 

Type 

Description 

OOH 

General 

Null descriptor, this byte contains no information 

01H 

TLB 

Instruction TLB: 4 KByte pages, 4-way set associative, 32 entries 

02H 

TLB 

Instruction TLB: 4 MByte pages, fully associative, 2 entries 

OSH 

TLB 

Data TLB: 4 KByte pages, 4-way set associative, 64 entries 

04H 

TLB 

Data TLB: 4 MByte pages, 4-way set associative, 8 entries 

OSH 

TLB 

Data TLB1:4 MByte pages, 4-way set associative, 32 entries 

06H 

Cache 

1 St-level instruction cache: 8 KBytes, 4-way set associative, 32 byte line size 

OSH 

Cache 

1 St-level instruction cache: 16 KBytes, 4-way set associative, 32 byte line size 

09H 

Cache 

1 St-level instruction cache: 32KBytes, 4-way set associative, 64 byte line size 

OAH 

Cache 

1 St-level data cache: 8 KBytes, 2-way set associative, 32 byte line size 

OBH 

TLB 

Instruction TLB: 4 MByte pages, 4-way set associative, 4 entries 

OCH 

Cache 

1 St-level data cache: 16 KBytes, 4-way set associative, 32 byte line size 

ODH 

Cache 

1 St-level data cache: 16 KBytes, 4-way set associative, 64 byte line size 

OEH 

Cache 

1 St-level data cache: 24 KBytes, 6-way set associative, 64 byte line size 

1DH 

Cache 

2nd-level cache: 128 KBytes, 2-way set associative, 64 byte line size 

21H 

Cache 

2nd-level cache: 256 KBytes, 8-way set associative, 64 byte line size 

22H 

Cache 

3rd-level cache: 512 KBytes, 4-way set associative, 64 byte line size, 2 lines per sector 

23H 

Cache 

3rd-level cache: 1 MBytes, 8-way set associative, 64 byte line size, 2 lines per sector 

24H 

Cache 

2nd-level cache: 1 MBytes, 16-way set associative, 64 byte line size 

25H 

Cache 

3rd-level cache: 2 MBytes, 8-way set associative, 64 byte line size, 2 lines per sector 

29H 

Cache 

3rd-level cache: 4 MBytes, 8-way set associative, 64 byte line size, 2 lines per sector 

2CH 

Cache 

1 St-level data cache: 32 KBytes, 8-way set associative, 64 byte line size 

BOH 

Cache 

1 St-level instruction cache: 32 KBytes, 8-way set associative, 64 byte line size 

40H 

Cache 

No 2nd-level cache or, if processor contains a valid 2nd-level cache, no 3rd-level cache 

41H 

Cache 

2nd-level cache: 128 KBytes, 4-way set associative, 32 byte line size 

42H 

Cache 

2nd-level cache: 256 KBytes, 4-way set associative, 32 byte line size 

43H 

Cache 

2nd-level cache: 512 KBytes, 4-way set associative, 32 byte line size 

44H 

Cache 

2nd-level cache: 1 MByte, 4-way set associative, 32 byte line size 

45H 

Cache 

2nd-level cache: 2 MByte, 4-way set associative, 32 byte line size 

46H 

Cache 

3rd-level cache: 4 MByte, 4-way set associative, 64 byte line size 

47H 

Cache 

3rd-level cache: 8 MByte, 8-way set associative, 64 byte line size 

48H 

Cache 

2nd-level cache: 3MByte, 12-way set associative, 64 byte line size 

49H 

Cache 

3rd-level cache: 4MB, 16-way set associative, 64-byte line size (Intel Xeon processor MP, Family OFH, Model 
06H); 

2nd-level cache: 4 MByte, 16-way set associative, 64 byte line size 

4AH 

Cache 

3rd-level cache: 6MByte, 12-way set associative, 64 byte line size 

4BH 

Cache 

3rd-level cache: 8MByte, 16-way set associative, 64 byte line size 

4CH 

Cache 

3rd-level cache: 12MByte, 12-way set associative, 64 byte line size 

4DH 

Cache 

3rd-level cache: 16MByte, 16-way set associative, 64 byte line size 

4EH 

Cache 

2nd-level cache: 6MByte, 24-way set associative, 64 byte line size 

4FH 

TLB 

Instruction TLB: 4 KByte pages, 32 entries 
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Table 3-12. Encoding of CPUID Leaf 2 Descriptors (Contd.) 


Value 

Type 

Description 

50H 

TLB 

Instruction TLB: 4 KByte and 2-MByte or 4-MByte pages, 64 entries 

51H 

TLB 

Instruction TLB: 4 KByte and 2-MByte or 4-MByte pages, 128 entries 

52H 

TLB 

Instruction TLB: 4 KByte and 2-MByte or 4-MByte pages, 256 entries 

55H 

TLB 

Instruction TLB: 2-MByte or 4-MByte pages, fully associative, 7 entries 

56H 

TLB 

Data TLBO: 4 MByte pages, 4-way set associative, 16 entries 

57H 

TLB 

Data TLBO: 4 KByte pages, 4-way associative, 16 entries 

59H 

TLB 

Data TLBO: 4 KByte pages, fully associative, 16 entries 

5AH 

TLB 

Data TLBO: 2 MByte or 4 MByte pages, 4-way set associative, 32 entries 

5BH 

TLB 

Data TLB: 4 KByte and 4 MByte pages, 64 entries 

5CH 

TLB 

Data TLB: 4 KByte and 4 MByte pages,128 entries 

SDH 

TLB 

Data TLB: 4 KByte and 4 MByte pages,256 entries 

60H 

Cache 

1 St-level data cache: 16 KByte, 8-way set associative, 64 byte line size 

61H 

TLB 

Instruction TLB: 4 KByte pages, fully associative, 48 entries 

63H 

TLB 

Data TLB: 2 MByte or 4 MByte pages, 4-way set associative, 32 entries and a separate array with 1 GByte 
pages, 4-way set associative, 4 entries 

64H 

TLB 

Data TLB: 4 KByte pages, 4-way set associative, 512 entries 

66H 

Cache 

1 St-level data cache: 8 KByte, 4-way set associative, 64 byte line size 

67H 

Cache 

1 St-level data cache: 16 KByte, 4-way set associative, 64 byte line size 

68H 

Cache 

1 St-level data cache: 32 KByte, 4-way set associative, 64 byte line size 

BAH 

Cache 

uTLB: 4 KByte pages, 8-way set associative, 64 entries 

6BH 

Cache 

DTLB: 4 KByte pages, 8-way set associative, 256 entries 

6CH 

Cache 

DTLB: 2M/4M pages, 8-way set associative, 128 entries 

6DH 

Cache 

DTLB: 1 GByte pages, fully associative, 16 entries 

70H 

Cache 

Trace cache: 12 K-pop, 8-way set associative 

71H 

Cache 

Trace cache: 16 K-pop, 8-way set associative 

72H 

Cache 

Trace cache: 32 K-pop, 8-way set associative 

76H 

TLB 

Instruction TLB: 2M/4M pages, fully associative, 8 entries 

78H 

Cache 

2nd-level cache: 1 MByte, 4-way set associative, 64byte line size 

79H 

Cache 

2nd-level cache: 128 KByte, 8-way set associative, 64 byte line size, 2 lines per sector 

7AH 

Cache 

2nd-level cache: 256 KByte, 8-way set associative, 64 byte line size, 2 lines per sector 

7BH 

Cache 

2nd-level cache: 512 KByte, 8-way set associative, 64 byte line size, 2 lines per sector 

7CH 

Cache 

2nd-level cache: 1 MByte, 8-way set associative, 64 byte line size, 2 lines per sector 

7DH 

Cache 

2nd-level cache: 2 MByte, 8-way set associative, 64byte line size 

7FH 

Cache 

2nd-level cache: 512 KByte, 2-way set associative, 64-byte line size 

BOH 

Cache 

2nd-level cache: 512 KByte, 8-way set associative, 64-byte line size 

82H 

Cache 

2nd-level cache: 256 KByte, 8-way set associative, 32 byte line size 

83H 

Cache 

2nd-level cache: 512 KByte, 8-way set associative, 32 byte line size 

84H 

Cache 

2nd-level cache: 1 MByte, 8-way set associative, 32 byte line size 

85H 

Cache 

2nd-level cache: 2 MByte, 8-way set associative, 32 byte line size 

86H 

Cache 

2nd-level cache: 512 KByte, 4-way set associative, 64 byte line size 

87H 

Cache 

2nd-level cache: 1 MByte, 8-way set associative, 64 byte line size 
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Table 3-12. Encoding of CPUID Leaf 2 Descriptors (Contd.) 


Value 

Type 

Description 

AOH 

DTLB 

DTLB: 4k pages, fully associative, 32 entries 

BOH 

TLB 

Instruction TLB: 4 KByte pages, 4-way set associative, 128 entries 

B1H 

TLB 

Instruction TLB: 2M pages, 4-way, 8 entries or 4M pages, 4-way, 4 entries 

B2H 

TLB 

Instruction TLB: 4KByte pages, 4-way set associative, 64 entries 

B3H 

TLB 

Data TLB: 4 KByte pages, 4-way set associative, 128 entries 

B4H 

TLB 

Data TLB1:4 KByte pages, 4-way associative, 256 entries 

B5H 

TLB 

Instruction TLB: 4KByte pages, 8-way set associative, 64 entries 

BOH 

TLB 

Instruction TLB: 4KByte pages, 8-way set associative, 128 entries 

BAH 

TLB 

Data TLB1:4 KByte pages, 4-way associative, 64 entries 

COH 

TLB 

Data TLB: 4 KByte and 4 MByte pages, 4-way associative, 8 entries 

C1H 

STLB 

Shared 2nd-Level TLB: 4 KByte/2MByte pages, 8-way associative, 1024 entries 

C2H 

DTLB 

DTLB: 4 KByte/2 MByte pages, 4-way associative, 16 entries 

C3H 

STLB 

Shared 2nd-Level TLB: 4 KByte /2 MByte pages, 6-way associative, 1536 entries. Also 1 GBbyte pages, 4-way, 

16 entries. 

C4H 

DTLB 

DTLB: 2M/4M Byte pages, 4-way associative, 32 entries 

CAH 

STLB 

Shared 2nd-Level TLB: 4 KByte pages, 4-way associative, 512 entries 

DOH 

Cache 

3rd-level cache: 512 KByte, 4-way set associative, 64 byte line size 

D1H 

Cache 

3rd-leuel cache: 1 MByte, 4-way set associative, 64 byte line size 

D2H 

Cache 

3rd-level cache: 2 MByte, 4-way set associative, 64 byte line size 

D6H 

Cache 

3rd-level cache: 1 MByte, 8-way set associative, 64 byte line size 

D7H 

Cache 

3rd-level cache: 2 MByte, 8-way set associative, 64 byte line size 

D8H 

Cache 

3rd-level cache: 4 MByte, 8-way set associative, 64 byte line size 

DCH 

Cache 

3rd-level cache: 1.5 MByte, 12-way set associative, 64 byte line size 

DDH 

Cache 

3rd-level cache: 3 MByte, 12-way set associative, 64 byte line size 

DEH 

Cache 

3rd-level cache: 6 MByte, 12-way set associative, 64 byte line size 

E2H 

Cache 

3rd-level cache: 2 MByte, 16-way set associative, 64 byte line size 

E3H 

Cache 

3rd-level cache: 4 MByte, 16-way set associative, 64 byte line size 

E4H 

Cache 

3rd-level cache: 8 MByte, 16-way set associative, 64 byte line size 

EAH 

Cache 

3rd-level cache: 12MByte, 24-way set associative, 64 byte line size 

EBH 

Cache 

3rd-level cache: 18MByte, 24-way set associative, 64 byte line size 

ECH 

Cache 

3rd-level cache: 24MByte, 24-way set associative, 64 byte line size 

FOH 

Prefetch 

64-Byte prefetching 

F1H 

Prefetch 

128-Byte prefetching 

FFH 

General 

CPUID leaf 2 does not report cache descriptor information, use CPUID leaf 4 to query cache parameters 


Example 3-1. Example of Cache and TLB Interpretation 

The first member of the family of Pentium 4 processors returns the following information about caches and TLBs 
when the CPUID executes with an input value of 2: 


EAX 66 SB 50 01H 
EBX OH 
ECX OH 
EDX 00 7A 70 OOH 
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Which means: 

• The least-significant byte (byte 0) of register EAX is set to OlH. This value should be ignored. 

• The most-significant bit of all four registers (EAX, EBX, ECX, and EDX) is set to 0, indicating that each register 
contains valid 1-byte descriptors. 

• Bytes 1, 2, and 3 of register EAX indicate that the processor has: 

— 50H - a 64-entry instruction TLB, for mapping 4-KByte and 2-MByte or 4-MByte pages. 

— 5BH - a 64-entry data TLB, for mapping 4-KByte and 4-MByte pages. 

— 66H - an 8-KByte 1st level data cache, 4-way set associative, with a 64-Byte cache line size. 

• The descriptors in registers EBX and ECX are valid, but contain NULL descriptors. 

• Bytes 0, 1, 2, and 3 of register EDX indicate that the processor has: 

— OOH - NULL descriptor. 

— 70H - Trace cache: 12 K-|.iop, 8-way set associative. 

— 7AH - a 256-KByte 2nd level cache, 8-way set associative, with a sectored, 64-byte cache line size. 

— OOH - NULL descriptor. 

INPUT EAX = 04H: Returns Deterministic Cache Parameters for Each Level 

When CPUID executes with EAX set to 04H and ECX contains an index value, the processor returns encoded data 
that describe a set of deterministic cache parameters (for the cache level associated with the input in ECX). Valid 
index values start from 0. 

Software can enumerate the deterministic cache parameters for each level of the cache hierarchy starting with an 
index value of 0, until the parameters report the value associated with the cache type field is 0. The architecturally 
defined fields reported by deterministic cache parameters are documented in Table 3-8. 

This Cache Size in Bytes 

= (Ways -I- 1) * (Partitions -F 1) * (Line_Size -F 1) * (Sets -F 1) 

= (EBX[31:22] -F 1) * (EBX[21:12] -F 1) * (EBX[11:0] -F 1) * (ECX -F 1) 


The CPUID leaf 04H also reports data that can be used to derive the topology of processor cores in a physical 
package. This information is constant for all valid index values. Software can query the raw data reported by 
executing CPUID with EAX=04H and ECX=0 and use it as part of the topology enumeration algorithm described in 
Chapter 8, "Multiple-Processor Management," in the Intel® 64 and IA-32 Architectures Software Developer's 
Manual, Volume 3A. 

INPUT EAX = OSH: Returns MONITOR and MWAIT Features 

When CPUID executes with EAX set to OSH, the processor returns information about features available to 
MONITOR/MWAIT instructions. The MONITOR instruction is used for address-range monitoring in conjunction with 
MWAIT instruction. The MWAIT instruction optionally provides additional extensions for advanced power manage¬ 
ment. See Table 3-8. 

INPUT EAX = 06H: Returns Thermal and Power Management Features 

When CPUID executes with EAX set to 06H, the processor returns information about thermal and power manage¬ 
ment features. See Table 3-8. 

INPUT EAX = 07H: Returns Structured Extended Feature Enumeration Information 

When CPUID executes with EAX set to 07H and ECX = 0, the processor returns information about the maximum 
input value for sub-leaves that contain extended feature flags. See Table 3-8. 

When CPUID executes with EAX set to 07H and the input value of ECX is invalid (see leaf 07H entry in Table 3-8), 
the processor returns 0 in EAX/EBX/ECX/EDX. In subleaf 0, EAX returns the maximum input value of the highest 
leaf 7 sub-leaf, and EBX, ECX & EDX contain information of extended feature flags. 
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INPUT EAX = 09H: Returns Direct Cache Access Information 

When CPUID executes with EAX set to 09H, the processor returns information about Direct Cache Access capabili¬ 
ties. See Table 3-8. 

INPUT EAX = OAH: Returns Architectural Performance Monitoring Features 

When CPUID executes with EAX set to OAH, the processor returns information about support for architectural 
performance monitoring capabilities. Architectural performance monitoring is supported if the version ID (see 
Table 3-8) is greater than Pn 0. See Table 3-8. 

For each version of architectural performance monitoring capability, software must enumerate this leaf to discover 
the programming facilities and the architectural performance events available in the processor. The details are 
described in Chapter 23, "Introduction to Virtual-Machine Extensions," in the Intel® 64 and IA-32 Architectures 
Software Developer's Manual, Volume 3C. 

INPUT EAX = OBH: Returns Extended Topology Information 

When CPUID executes with EAX set to OBH, the processor returns information about extended topology enumera¬ 
tion data. Software must detect the presence of CPUID leaf OBH by verifying (a) the highest leaf index supported 
by CPUID is >= OBH, and (b) CPUID.0BH:EBX[15:0] reports a non-zero value. See Table 3-8. 

INPUT EAX = ODH: Returns Processor Extended States Enumeration Information 

When CPUID executes with EAX set to ODH and ECX = 0, the processor returns information about the bit-vector 
representation of all processor state extensions that are supported in the processor and storage size requirements 
of the XSAVE/XRSTOR area. See Table 3-8. 

When CPUID executes with EAX set to ODH and ECX = n (n > 1, and is a valid sub-leaf index), the processor returns 
information about the size and offset of each processor extended state save area within the XSAVE/XRSTOR area. 
See Table 3-8. Software can use the forward-extendable technique depicted below to query the valid sub-leaves 
and obtain size and offset information for each processor extended state save area: 

For i = 2 to 62 // sub-leaf 1 is reserved 

IF (CPUID.(EAX=0DH, ECX=0):VECT0R[l] = 1 ) // VECTOR Is the 64-blt value of EDX:EAX 
Execute CPUID.(EAX=0DH, ECX = I) to examine size and offset for sub-leaf I; 

FI; 

INPUT EAX = OFH: Returns Intel Resource Director Technology (Intel RDT) Monitoring Enumeration Information 

When CPUID executes with EAX set to OFH and ECX = 0, the processor returns information about the bit-vector 
representation of QoS monitoring resource types that are supported in the processor and maximum range of RMID 
values the processor can use to monitor of any supported resource types. Each bit, starting from bit 1, corresponds 
to a specific resource type if the bit is set. The bit position corresponds to the sub-leaf index (or ResID) that soft¬ 
ware must use to query QoS monitoring capability available for that type. See Table 3-8. 

When CPUID executes with EAX set to OFH and ECX = n (n >= 1, and is a valid ResID), the processor returns infor¬ 
mation software can use to program IA32_PQR_ASSOC, IA32_QM_EVTSEL MSRs before reading QoS data from the 
IA32_QM_CTR MSR. 

INPUT EAX = 10H: Returns Intel Resource Director Technology (Intel RDT) Allocation Enumeration Information 

When CPUID executes with EAX set to lOH and ECX = 0, the processor returns information about the bit-vector 
representation of QoS Enforcement resource types that are supported in the processor. Each bit, starting from bit 
1, corresponds to a specific resource type if the bit is set. The bit position corresponds to the sub-leaf index (or 
ResID) that software must use to query QoS enforcement capability available for that type. See Table 3-8. 

When CPUID executes with EAX set to lOH and ECX = n (n >= 1, and is a valid ResID), the processor returns infor¬ 
mation about available classes of service and range of QoS mask MSRs that software can use to configure each 
class of services using capability bit masks in the QoS Mask registers, IA32_resourceType_Mask_n. 
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INPUT EAX = 12H: Returns Intel SGX Enumeration Information 

When CPUID executes with EAX set to 12H and ECX = OH, the processor returns information about Intel SGX capa¬ 
bilities. See Table 3-8. 

When CPUID executes with EAX set to 12H and ECX = IH, the processor returns information about Intel SGX attri¬ 
butes. See Table 3-8. 

When CPUID executes with EAX set to 12H and ECX = n (n > 1), the processor returns information about Intel SGX 
Enclave Page Cache. See Table 3-8. 

INPUT EAX = 14H: Returns Intel Processor Trace Enumeration Information 

When CPUID executes with EAX set to 14H and ECX = OH, the processor returns information about Intel Processor 
Trace extensions. See Table 3-8. 

When CPUID executes with EAX set to 14H and ECX = n (n > 0 and less than the number of non-zero bits in 
CPUID.(EAX=14H, ECX= OH).EAX), the processor returns information about packet generation in Intel Processor 
Trace. See Table 3-8. 

INPUT EAX = 15H: Returns Time Stamp Counter and Nominal Core Crystal Clock Information 

When CPUID executes with EAX set to 15H and ECX = OH, the processor returns information about Time Stamp 
Counter and Core Crystal Clock. See Table 3-8. 

INPUT EAX = 16H: Returns Processor Frequency Information 

When CPUID executes with EAX set to 16H, the processor returns information about Processor Frequency Informa¬ 
tion. See Table 3-8. 

INPUT EAX = 17H: Returns System-On-Chip Information 

When CPUID executes with EAX set to 17H, the processor returns information about the System-On-Chip Vendor 
Attribute Enumeration. See Table 3-8. 

METHODS FOR RETURNING BRANDING INFORMATION 

Use the following techniques to access branding information: 

1. Processor brand string method. 

2. Processor brand index; this method uses a software supplied brand string table. 

These two methods are discussed in the following sections. For methods that are available in early processors, see 
Section: "Identification of Earlier IA-32 Processors" in Chapter 19 of the Intel® 64 and IA-32 Architectures Soft¬ 
ware Developer's Manual, Volume 1. 

The Processor Brand String Method 

Figure 3-9 describes the algorithm used for detection of the brand string. Processor brand identification software 
should execute this algorithm on all Intel 64 and IA-32 processors. 

This method (introduced with Pentium 4 processors) returns an ASCII brand identification string and the Processor 
Base frequency of the processor to the EAX, EBX, ECX, and EDX registers. 
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Figure 3-9. Determination of Support for the Processor Brand String 


How Brand Strings Work 

To use the brand string method, execute CPUID with EAX input of 8000002H through 80000004H. For each input 
value, CPUID returns 16 ASCII characters using EAX, EBX, ECX, and EDX. The returned string will be NULL-termi- 
nated. 
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Table 3-13 shows the brand string that is returned by the first processor in the Pentium 4 processor family. 


Table 3-13. Processor Brand String Returned with Pentium 4 Processor 


EAX Input Value 

Return Values 

ASCII Equivalent 

80000002H 

EAX = 20202020H 

" 


EBX = 20202020H 

11 1 , 


ECX = 20202020H 

11 1 , 


EDX = 6E492020H 

"nl " 

80000003H 

EAX = 286C6574H 

"(let" 


EBX = 50202952H 

"P )R" 


ECX = 69746E65H 

"itne" 


EDX = 52286D75H 

"R(mu" 

80000004H 

EAX = 20342029H 

-4)" 


EBX = 20555043H 

" UPC" 


ECX = 30303531H 

"0051" 


EDX = 007A484DH 

"\0zHM" 


Extracting the Processor Frequency from Brand Strings 

Figure 3-10 provides an algorithm which software can use to extract the Processor Base frequency from the 
processor brand string. 



OM15195 


Figure 3-10. Algorithm for Extracting Processor Frequency 
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The Processor Brand Index Method 

The brand index method (introduced with Pentium® III Xeon® processors) provides an entry point into a brand 
identification table that is maintained in memory by system software and is accessible from system- and user-level 
code. In this table, each brand index is associate with an ASCII brand identification string that identifies the official 
Intel family and model number of a processor. 

When CPUID executes with EAX set to 1, the processor returns a brand index to the low byte in EBX. Software can 
then use this index to locate the brand identification string for the processor in the brand identification table. The 
first entry (brand index 0) in this table is reserved, allowing for backward compatibility with processors that do not 
support the brand identification feature. Starting with processor signature family ID = OFH, model = 03FI, brand 
index method is no longer supported. Use brand string method instead. 

Table 3-14 shows brand indices that have identification strings associated with them. 


Table 3-14. Mapping of Brand Indices; and Intel 64 and IA-32 Processor Brand Strings 


Brand Index 

Brand String 

OOH 

This processor does not support the brand identification feature 

01H 

Intel(R) Celeron(R) processor^ 

02H 

Intel(R) Pentium(R) III processor^ 

03H 

Intel(R) Pentium(R) III Xeon(R) processor; If processor signature = 000006B1 h, then Intel(R) Celeron(R) 
processor 

04H 

Intel(R) Pentium(R) III processor 

06H 

Mobile Intel(R) Pentium(R) III processor-M 

07H 

Mobile Intel(R) Celeron(R) processor^ 

OSH 

Intel(R) Pentium(R) 4 processor 

09H 

Intel(R) Pentium(R) 4 processor 

OAH 

Intel(R) Celeron(R) processor^ 

OBH 

Intel(R) Xeon(R) processor; If processor signature = 00000F13h, then Intel(R) Xeon(R) processor MP 

OCH 

Intel(R) Xeon(R) processor MP 

OEH 

Mobile Intel(R) Pentium(R) 4 processor-M; If processor signature = 00000F13h, then Intel(R) Xeon(R) processor 

OFH 

Mobile Intel(R) Celeron(R) processor^ 

11H 

Mobile Genuine Intel(R) processor 

12H 

Intel(R) Celeron(R) M processor 

13H 

Mobile Intel(R) Celeron(R) processor^ 

14H 

Intel(R) Celeron(R) processor 

15H 

Mobile Genuine Intel(R) processor 

16H 

Intel(R) Pentium(R) M processor 

17H 

Mobile Intel(R) Celeron(R) processor^ 

18H-0FFH 

RESERVED 


NOTES: 


1. Indicates versions of these processors that were introduced after the Pentium III 

IA-32 Architecture Compatibility 

CPUID is not supported in early models of the Intel486 processor or in any IA-32 processor earlier than the 
Intel486 processor. 
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Operation 

IA32_BIOS_SIGN_ID MSR Update with installed microcode revision number; 

CASE (EAX) OF 
EAX = 0: 

EAX Fllghest basic function input value understood by CPUID; 

EBX <— Vendor Identification string; 

EDX <— Vendor identification string; 

ECX Vendor identification string; 

BREAK; 

EAX = 1H: 

EAX[3:0] ^ Stepping ID; 

EAX[7:4] ^ Model; 

EAX[11:8]^ Family; 

EAX[13:12] Processor type; 

EAX[15:14]^ Reserved; 

EAX[19:16] ^ Extended Model; 

EAX[27:20] ^ Extended Family; 

EAX[31:28]^ Reserved; 

EBX[7:0] <— Brand Index; (* Reserved If the value Is zero. *) 

EBX[15:8] ^ CLFLUSH Line Size; 

EBX[16:23] <— Reserved; (* Number of threads enabled = 2 if MT enable fuse set. *) 
EBX[24:31]^ Initial APIC ID; 

ECX Feature flags; (* See Figure 3-7. *) 

EDX <— Feature flags; (* See Figure 3-8. *) 

BREAK; 

EAX= 2H: 

EAX Cache and TLB information; 

EBX <— Cache and TLB information; 

ECX Cache and TLB information; 

EDX <— Cache and TLB information; 

BREAK; 

EAX = 3H: 

EAX Reserved; 

EBX <— Reserved; 

ECX ProcessorSerialNumber[31:0]; 

(* Pentium III processors only, otherwise reserved. *) 

EDX <— ProcessorSerlalNumber[63:32]; 

(* Pentium III processors only, otherwise reserved. * 

BREAK 
EAX = 4H: 

EAX Deterministic Cache Parameters Leaf; (* See Table 3-8. *) 

EBX <— Deterministic Cache Parameters Leaf; 

ECX Deterministic Cache Parameters Leaf; 

EDX <— Deterministic Cache Parameters Leaf; 

BREAK; 

EAX = 5H: 

EAX ^ MONITOR/MWAIT Leaf; (* See Table 3-8. *) 

EBX ^ MONITOR/MWAIT Leaf; 

ECX ^ MONITOR/MWAIT Leaf; 

EDX ^ MONITOR/MWAIT Leaf; 

BREAK; 
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EAX = 6H: 

EAX Thermal and Power Management Leaf; (* See Table 3-8. *) 

EBX ^ Thermal and Power Management Leaf; 

ECX ^ Thermal and Power Management Leaf; 

EDX Thermal and Power Management Leaf; 

BREAK; 

EAX = 7H: 

EAX Structured Extended Feature Flags Enumeration Leaf; (* See Table 3-8. *) 
EBX c— Structured Extended Feature Flags Enumeration Leaf; 

ECX c— Structured Extended Feature Flags Enumeration Leaf; 

EDX c— Structured Extended Feature Flags Enumeration Leaf; 

BREAK; 

EAX = 8H: 

EAX c— Reserved = 0; 

EBX c— Reserved = 0; 

ECX c— Reserved = 0; 

EDX c— Reserved = 0; 

BREAK; 

EAX = 9H: 

EAX <— Direct Cache Access Information Leaf; (* See Table 3-8. *) 

EBX ^ Direct Cache Access Information Leaf; 

ECX c— Direct Cache Access Information Leaf; 

EDX c— Direct Cache Access Information Leaf; 

BREAK; 

EAX = AH: 

EAX c— Architectural Performance Monitoring Leaf; (* See Table 3-8. *) 

EBX c— Architectural Performance Monitoring Leaf; 

ECX c— Architectural Performance Monitoring Leaf; 

EDX c— Architectural Performance Monitoring Leaf; 

BREAK 
EAX = BH: 

EAX c— Extended Topology Enumeration Leaf; (* See Table 3-8. *) 

EBX c— Extended Topology Enumeration Leaf; 

ECX <— Extended Topology Enumeration Leaf; 

EDX <— Extended Topology Enumeration Leaf; 

BREAK; 

EAX = CH: 

EAX c— Reserved = 0; 

EBX c— Reserved = 0; 

ECX c— Reserved = 0; 

EDX c— Reserved = 0; 

BREAK; 

EAX = DH: 

EAX <— Processor Extended State Enumeration Leaf; (* See Table 3-8. *) 

EBX c— Processor Extended State Enumeration Leaf; 

ECX c— Processor Extended State Enumeration Leaf; 

EDX c— Processor Extended State Enumeration Leaf; 

BREAK; 

EAX = EH: 

EAX c— Reserved = 0; 

EBX c— Reserved = 0; 

ECX c— Reserved = 0; 

EDX c— Reserved = 0; 

BREAK; 
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EAX = FH: 

EAX Intel Resource Director Technology Monitoring Enumeration Leaf; (* See Table 3-8. *) 

EBX Intel Resource Director Technology Monitoring Enumeration Leaf; 

ECX <— Intel Resource Director Technology Monitoring Enumeration Leaf; 

EDX Intel Resource Director Technology Monitoring Enumeration Leaf; 

BREAK; 

EAX =1 OH: 

EAX Intel Resource Director Technology Allocation Enumeration Leaf; (* See Table 3-8. *) 

EBX Intel Resource Director Technology Allocation Enumeration Leaf; 

ECX ^ Intel Resource Director Technology Allocation Enumeration Leaf; 

EDX ^ Intel Resource Director Technology Allocation Enumeration Leaf; 

BREAK; 

EAX = 12H: 

EAX Intel SGX Enumeration Leaf; (* See Table 3-8. *) 

EBX Intel SGX Enumeration Leaf; 

ECX <— Intel SGX Enumeration Leaf; 

EDX Intel SGX Enumeration Leaf; 

BREAK; 

EAX= 14H: 

EAX Intel Processor Trace Enumeration Leaf; (* See Table 3-8. *) 

EBX Intel Processor Trace Enumeration Leaf; 

ECX Intel Processor Trace Enumeration Leaf; 

EDX Intel Processor Trace Enumeration Leaf; 

BREAK; 

EAX=15H; 

EAX Time Stamp Counter and Nominal Core Crystal Clock Information Leaf; (* See Table 3-8. *) 

EBX Time Stamp Counter and Nominal Core Crystal Clock Information Leaf; 

ECX ^ Time Stamp Counter and Nominal Core Crystal Clock Information Leaf; 

EDX Time Stamp Counter and Nominal Core Crystal Clock Information Leaf; 

BREAK; 

EAX= 16H: 

EAX Processor Frequency Information Enumeration Leaf; (* See Table 3-8. *) 

EBX Processor Frequency Information Enumeration Leaf; 

ECX Processor Frequency Information Enumeration Leaf; 

EDX Processor Frequency Information Enumeration Leaf; 

BREAK; 

EAX= 17H: 

EAX System-On-Chip Vendor Attribute Enumeration Leaf; (* See Table 3-8. *) 

EBX System-On-Chip Vendor Attribute Enumeration Leaf; 

ECX System-On-Chip Vendor Attribute Enumeration Leaf; 

EDX System-On-Chip Vendor Attribute Enumeration Leaf; 

BREAK; 

EAX = 80000000H; 

EAX <— Highest extended function input value understood by CPUID; 

EBX Reserved; 

ECX ^ Reserved; 

EDX Reserved; 

BREAK; 

EAX = 80000001H: 

EAX Reserved; 

EBX Reserved; 

ECX <— Extended Feature Bits (* See Table 3-8.*); 

EDX Extended Feature Bits (* See Table 3-8. *); 

BREAK; 
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EAX = 80000002H: 

EAX ^ Processor Brand String; 

EBX c— Processor Brand String, continued; 

ECX c— Processor Brand String, continued; 

EDX c— Processor Brand String, continued; 

BREAK; 

EAX = 80000003H: 

EAX c— Processor Brand String, continued; 

EBX <— Processor Brand String, continued; 

ECX c— Processor Brand String, continued; 

EDX c— Processor Brand String, continued; 

BREAK; 

EAX = 80000004H: 

EAX c— Processor Brand String, continued; 

EBX <— Processor Brand String, continued; 

ECX c— Processor Brand String, continued; 

EDX c— Processor Brand String, continued; 

BREAK; 

EAX = 80000005H: 

EAX c— Reserved = 0; 

EBX c— Reserved = 0; 

ECX c— Reserved = 0; 

EDX c— Reserved = 0; 

BREAK; 

EAX = 80000006H: 

EAX c— Reserved = 0; 

EBX c— Reserved = 0; 

ECX c— Cache information; 

EDX c— Reserved = 0; 

BREAK; 

EAX = 80000007H: 

EAX c— Reserved = 0; 

EBX c— Reserved = 0; 

ECX c— Reserved = 0; 

EDX c— Reserved = Misc Feature Flags; 

BREAK; 

EAX = 80000008H: 

EAX c— Reserved = Physical Address Size Information; 

EBX c— Reserved = Virtual Address Size Information; 

ECX c— Reserved = 0; 

EDX c— Reserved = 0; 

BREAK; 

EAX >= 40000000H and EAX <= 4FFFFFFFH: 

DEFAULT: (* EAX = Value outside of recognized range for CPUID. *) 

(* If the highest basic information leaf data depend on ECX input value, ECX is honored.*) 
EAX c— Reserved; (* Information returned for highest basic information leaf. *) 

EBX Reserved; (* Information returned for highest basic information leaf. *) 

ECX c— Reserved; (* Information returned for highest basic information leaf. *) 

EDX c— Reserved; (* Information returned for highest basic information leaf. *) 

BREAK; 

ESAC; 

Flags Affected 

None. 
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Exceptions (All Operating Modes) 

#UD If the LOCK prefix is used. 

In earlier IA-32 processors that do not support the CPUID instruction, execution of the instruc¬ 
tion results in an invalid opcode (#UD) exception being generated. 
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CRC32 — Accumulate CRC32 Value 


Opcode/ 

Instruction 

Op/ 

En 

64-Bit 

Mode 

Compat/ 
Leg Mode 

Description 

F2 OF 38 FO /r 

CRC32 r32, r/m8 

RM 

Valid 

Valid 

Accumulate CRC32 on r/mS. 

F2 REX OF 38 FO /r 

CRC32 r32, r/m8* 

RM 

Valid 

N.E. 

Accumulate CRC32 on r/mS. 

F2 OF 38 FI /r 

CRC32 r32, r/m16 

RM 

Valid 

Valid 

Accumulate CRC32 on r/m16. 

F2 OF 38 FI /r 

CRC32 r32, r/m32 

RM 

Valid 

Valid 

Accumulate CRC32 on r/m32. 

F2 REX.W OF 38 FO /r 

CRC32 r64, r/m8 

RM 

Valid 

N.E. 

Accumulate CRC32 on r/mS. 

F2 REX.W OF 38 FI /r 

CRC32 r64, r/m64 

RM 

Valid 

N.E. 

Accumulate CRC32 on r/m64. 


NOTES: 

*ln 64-blt mode, r/m8 can not be encoded to access the following byte registers If a REX prefix Is used: AH, BH, CH, DH. 


Instruction Operand 

Encoding 

Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RM 

ModRM:reg (r, w) 

ModRM:r/m (r) 

NA 

NA 


Description 

Starting with an initial value in the first operand (destination operand), accumulates a CRC32 (polynomial 
11EDC6F41H) value for the second operand (source operand) and stores the result in the destination operand. The 
source operand can be a register or a memory location. The destination operand must be an r32 or r64 register. If 
the destination is an r64 register, then the 32-bit result is stored in the least significant double word and 
OOOOOOOOH is stored in the most significant double word of the r64 register. 

The initial value supplied in the destination operand is a double word integer stored in the r32 register or the least 
significant double word of the r64 register. To incrementally accumulate a CRC32 value, software retains the result 
of the previous CRC32 operation in the destination operand, then executes the CRC32 instruction again with new 
input data in the source operand. Data contained in the source operand is processed in reflected bit order. This 
means that the most significant bit of the source operand is treated as the least significant bit of the quotient, and 
so on, for all the bits of the source operand. Likewise, the result of the CRC operation is stored in the destination 
operand in reflected bit order. This means that the most significant bit of the resulting CRC (bit 31) is stored in the 
least significant bit of the destination operand (bit 0), and so on, for all the bits of the CRC. 

Operation 

Notes: 

BIT_REFLECT64: DST[63-0] = SRC[0-63] 

BIT_REFLECT32: DST[31-0] = SRC[0-31] 

BIT_REFLECT16: DST[15-0] = SRC[0-15] 

BIT_REFLECT8: DST[7-0] = SRC[0-7] 

M0D2: Remainder from Polynomial division modulus 2 
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CRC32 instruction for 64-bit source operand and 64-bit destination operand: 

TEMPI [63-0] ^ BIT_REFLECT64 (SRC[63-0]) 

TEMP2[31-0] ^ BIT_REFLECT32 (DEST[31-0]) 

TEMP3[95-0] ^ TEMPI [63-0] « 32 
TEMP4[95-0] ^ TEMP2[31 -0] « 64 
TEMP5[95-0] ^ TEMP3[95-0] XOR TEMP4[95-0] 

TEMP6[31 -0] ^ TEMP5[95-0] M0D2 11EDC6F41H 
DEST[31-0] ^ BIT_REFLECT (TEMP6[31-0]) 

DEST[63-32] ^ OOOOOOOOH 

CRC32 instruction for 32-bit source operand and 32-bit destination operand: 

TEMPI [31-0] ^ BIT_REFLECT32 (SRC[31 -0]) 

TEMP2[31-0] ^ BIT_REFLECT32 (DEST[31-0]) 

TEMP3[63-0] ^ TEMPI [31-0] « 32 
TEMP4[63-0] ^ TEMP2[31-0] « 32 
TEMP5[63-0] ^ TEMP3[63-0] XOR TEMP4[63-0] 

TEMP6[31 -0] ^ TEMP5[63-0] M0D2 11EDC6F41H 
DEST[31-0] ^ BIT_REFLECT (TEMP6[31-0]) 

CRC32 instruction for 16-bit source operand and 32-bit destination operand: 

TEMPI [15-0] ^ BIT_REFLECT16 (SRC[1 5-0]) 

TEMP2[31-0] ^ BIT_REFLECT32 (DEST[31-0]) 

TEMP3[47-0] ^ TEMPI [15-0] « 32 
TEMP4[47-0] ^ TEMP2[31 -0] « 16 
TEMP5[47-0] ^ TEMP3[47-0] XOR TEMP4[47-0] 

TEMP6[31 -0] ^ TEMP5[47-0] M0D2 11EDC6F41H 
DEST[31-0] ^ BIT_REFLECT (TEMP6[31-0]) 

CRC32 instruction for 8-bit source operand and 64-bit destination operand: 

TEMPI [7-0] ^ BIT_REFLECT8(SRC[7-0]) 

TEMP2[31-0] ^ BIT_REFLECT32 (DEST[31-0]) 

TEMP3[39-0] ^ TEMPI [7-0] « 32 
TEMP4[39-0] ^ TEMP2[31 -0] « 8 
TEMP5[39-0] ^ TEMP3[39-0] XOR TEMP4[39-0] 

TEMP6[31 -0] ^ TEMP5[39-0] M0D2 11EDC6F41H 
DEST[31-0] ^ BIT_REFLECT (TEMP6[31-0]) 

DEST[63-32] ^ OOOOOOOOH 

CRC32 instruction for 8-bit source operand and 32-bit destination operand: 

TEMPI [7-0] ^ BIT_REFLECT8(SRC[7-0]) 

TEMP2[31-0] ^ BIT_REFLECT32 (DEST[31-0]) 

TEMP3[39-0] ^ TEMPI [7-0] « 32 
TEMP4[39-0] ^ TEMP2[31 -0] « 8 
TEMP5[39-0] ^ TEMP3[39-0] XOR TEMP4[39-0] 

TEMP6[31 -0] ^ TEMP5[39-0] M0D2 11EDC6F41H 
DEST[31-0] ^ BIT_REFLECT (TEMP6[31-0]) 

Flags Affected 

None 


3-226 Vol. 2A 


CRC32 


Accumulate CRC32 Value 


INSTRUCTION SET REFERENCE, A-L 


Intel C/C++ Compiler Intrinsic Equivalent 

unsigned Int_mm_crc32_u8( unsigned Int crc, unsigned char data ) 
unsigned int _mm_crc32_u16( unsigned int crc, unsigned short data) 
unsigned int_mm_crc32_u32( unsigned int crc, unsigned int data ) 
unsinged_int64 _mm_crc32_u64( unsinged_int64 crc, unsigned_int64 data ) 

SIMD Floating Point Exceptions 

None 

Protected Mode Exceptions 

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS or GS segments. 

#SS(0) If a memory operand effective address is outside the SS segment limit. 

#PF (fault-code) For a page fault. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the 

current privilege level is 3. 

#UD If CPUID.01H:ECX.SSE4_2 [Bit 20] = 0. 

If LOCK prefix is used. 

Real-Address Mode Exceptions 

#GP(0) If any part of the operand lies outside of the effective address space from 0 to OFFFFFI. 

#SS(0) If a memory operand effective address is outside the SS segment limit. 

#UD If CPUID.01H:ECX.SSE4_2 [Bit 20] = 0. 

If LOCK prefix is used. 

Virtual 8086 Mode Exceptions 

#GP(0) If any part of the operand lies outside of the effective address space from 0 to OFFFFFI. 

#SS(0) If a memory operand effective address is outside the SS segment limit. 

#PF (fault-code) For a page fault. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made. 

#UD If CPUID.01H:ECX.SSE4_2 [Bit 20] = 0. 

If LOCK prefix is used. 

Compatibility Mode Exceptions 

Same exceptions as in Protected Mode. 

64-Bit Mode Exceptions 

#GP(0) If the memory address is in a non-canonical form. 

#SS(0) If a memory address referencing the SS segment is in a non-canonical form. 

#PF (fault-code) For a page fault. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the 

current privilege level is 3. 

#UD If CPUID.01H:ECX.SSE4_2 [Bit 20] = 0. 

If LOCK prefix is used. 
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CVTDQZPD—Convert Packed Doubleword Integers to Packed Double-Precision Floating-Point 
Values 


Opcode/ 

Instruction 

Op/ 

En 

64/32 
bit Mode 
Support 

CPUID 

Feature 

Flag 

Description 

F3 OF E6 /r 

CVTDQ2PD xmmi, xmm2/m64 

RM 

V/V 

SSE2 

Convert two packed signed doubleword integers from 
xmm2/mem to two packed double-precision floating¬ 
point values in xmmi. 

VEX.128.F3.0F.WIG E6 /r 

VCVTDQ2PD xmmi, xmm2/m64 

RM 

v/v 

AVX 

Convert two packed signed doubleword integers from 
xmm2/mem to two packed double-precision floating¬ 
point values in xmmi. 

VEX.256.F3.0F.WIG E6 /r 

VCVTDQ2PD ymmi, xmm2/m128 

RM 

V/V 

AVX 

Convert four packed signed doubleword integers from 
xmm2/mem to four packed double-precision floating¬ 
point values in ymmi. 

EVEX.128.F3.0F.W0 E6 /r 

VCVTDQ2PD xmmi [k1}[z], 
xmm2/m128/m32bcst 

HV 

v/v 

AVX512VL 

AVX512F 

Convert 2 packed signed doubleword integers from 
xmm2/m128/m32bcst to eight packed double-precision 
floating-point values in xmmi with writemask kl. 

EVEX.256.F3.0F.W0 E6 /r 

VCVTDQ2PD ymmi {k1]{z}, 
xmm2/m128/m32bcst 

HV 

v/v 

AVX512VL 

AVX512F 

Convert 4 packed signed doubleword integers from 
xmm2/m128/m32bcst to 4 packed double-precision 
floating-point values in ymmi with writemask kl. 

EVEX.512.F3.0F.W0 E6 /r 

VCVTDQ2PD zmmi {k1}{z}, 
ymm2/m256/m32bcst 

HV 

v/v 

AVX512F 

Convert eight packed signed doubleword integers from 
ymm2/m256/m32bcst to eight packed double-precision 
floating-point values in zmmi with writemask kl. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RM 

ModRM:reg (w) 

ModRM:r/m (r) 

NA 

NA 

HV 

ModRM:reg (w) 

ModRM:r/m (r) 

NA 

NA 


Description 

Converts two, four or eight packed signed doubleword integers in the source operand (the second operand) to two, 
four or eight packed double-precision floating-point values in the destination operand (the first operand). 

EVEX encoded versions: The source operand can be a YMM/XMM/XMM (low 64 bits) register, a 256/128/64-bit 
memory location or a 256/128/64-bit vector broadcasted from a 32-bit memory location. The destination operand 
is a ZMM/YMM/XMM register conditionally updated with writemask kl. Attempt to encode this instruction with EVEX 
embedded rounding is ignored. 

VEX.256 encoded version: The source operand is an XMM register or 128- bit memory location. The destination 
operand is a YMM register. 

VEX.128 encoded version: The source operand is an XMM register or 64- bit memory location. The destination 
operand is a XMM register. The upper Bits (MAX_VL-1:128) of the corresponding ZMM register destination are 
zeroed. 

128-bit Legacy SSE version: The source operand is an XMM register or 64- bit memory location. The destination 
operand is an XMM register. The upper Bits (MAX_VL-1:128) of the corresponding ZMM register destination are 
unmodified. 

VEX.vvvv and EVEX.vvvv are reserved and must be 1111b, otherwise instructions will #UD. 
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Figure 3-11. CVTDQ2PD (VEX.256 encoded version) 


Operation 

VCVTDQZPD (EVEX encoded versions) when src operand is a register 

(KL, VL) = (2,128), (4, 256), (8, 512) 

FOR) ^0 TO KL-1 
I ^ j * 64 
k'ej*32 

IF k10] OR *no writemask* 

THEN DEST[I+63:I] ^ 

Convert_lnteger_To_Double_Preclslon_Floatlng_Point(SRC[k+31:k]) 

ELSE 

IF *merglng-masking* ; merging-masking 

THEN *DEST[i+63:i] remains unchanged* 

ELSE ; zeroing-masking 

DEST[i+63:i] ^ 0 
FI 
FI; 

ENDFOR 

DEST[MAX_VL-1:VL]^0 
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VCVTDQZPD (EVEX encoded versions) when src operand is a memory source 

(KL, VL) = (2,128), (4, 256), (8, 512) 

FOR] ^0 TO KL-1 
i ^ j * 64 
k^j*32 

IF k1 [j] OR *no writemask* 

THEN 

IF (EVEX.b = 1) 

THEN 

DEST[i+63:i] ^ 

Convert_lnteger_To_Double_Precislon_Floating_Polnt(SRC[31:0]) 

ELSE 

DEST[i+63:i] ^ 

Convert_lnteger_To_Double_Precislon_Floatlng_Polnt(SRC[k+31:k]) 

FI; 

ELSE 

IF *merglng-masklng* ; merglng-masklng 

THEN *DEST[I+63:I] remains unchanged* 

ELSE ; zeroing-masking 

DEST[i+63:i] ^ 0 
FI 
FI; 

ENDFOR 

DEST[MAX_VL-1 :VL] ^ 0 

VCVTDQ2PD (VEX.256 encoded version) 

DEST[63:0] <- Convert_lnteger_To_Double_Precision_Floating_Point(SRC[31:0]) 

DEST[127:64] <- Convert_lnteger_To_Double_Precision_Floating_Point(SRC[63:32]) 

DEST[191:128] <- Convert_lnteger_To_Double_Precision_Floating_Point(SRC[95:64]) 
DEST[255:192] <- Convert_lnteger_To_Double_Precision_Floating_Point(SRC[127:96) 
DEST[MAX_VL-1:256]^0 

VCVTDQ2PD (VEX.128 encoded version) 

DEST[63:0] <- Convert_lnteger_To_Double_Precision_Floating_Point(SRC[31:0]) 

DEST[127:64] <- Convert_lnteger_To_Double_Precision_Floating_Point(SRC[63:32]) 
DEST[MAX_VL-1:128]^0 

CVTDQ2PD (128-bit Legacy SSE version) 

DEST[63:0] <- Convert_lnteger_To_Double_Precision_Floating_Point(SRC[31:0]) 

DEST[127:64] <- Convert_lnteger_To_Double_Precision_Floating_Point(SRC[63:32]) 
DEST[MAX_VL-1:128] (unmodified) 

Intel C/C++ Compiler Intrinsic Equivalent 

VCVTDQ2PD _m512d _mm512_cvtepi32_pd( _m256i a); 

\/C\/TDQ2PD m512d_mm512_mask_cvtepi32_pd( m512d s, mmask8 k, m256i a); 

\/C\/TDQ2PD_m512d _mm512_maskz_cvtepi32 _pd(_mmask8 k,_m256i a); 

\/C\/TDQ2PD m256d _mm256_mask_cvtepi32_pd( m256d s, mmask8 k, m256i a); 

\/C\/TDQ2PD_m256d _mm256_maskz_cvtepi32_pd(_mmask8 k,_m256i a); 

\/C\/TDQ2PD_ml 28d _mm_mask_cvtepi32_pd(_ml 28d s,_mmask8 k,_ml 281 a); 

\/C\/TDQ2PD_ml 28d _mm_maskz_cvtepi32_pd(_mmask8 k,_ml 28i a); 

CVTDQ2PD _m256d _mm256_cvtepi32_pd (_m1281 src) 

C\/TDQ2PD_ml 28d _mm_cvtepi32_pd (_ml 281 src) 
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Other Exceptions 

VEX-encoded instructions, see Exceptions Type 5; 

EVEX-encoded instructions, see Exceptions Type E5. 

#UD If VEX.vvvv != llllB or EVEX.vvvv != llllB. 
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CVTDQZPS—Convert Packed Doubleword Integers to Packed Single-Precision Floating-Point 
Values 


Opcode 

Instruction 

Op/ 

En 

64/32 
bit Mode 
Support 

CPUID 

Feature 

Flag 

Description 

OF 5B /r 

CVTDQ2PS xmm 1, xmm2/m 128 

RM 

V/V 

SSE2 

Convert four packed signed doubleword integers from 
xmm2/mem to four packed single-precision floating¬ 
point values in xmmi. 

VEX.128.0F.WIG5B/r 

VCVTDQ2PS xmmi, xmm2/m128 

RM 

v/v 

AVX 

Convert four packed signed doubleword integers from 
xmm2/mem to four packed single-precision floating¬ 
point values in xmmi. 

VEX.256.0F.WIG 5B /r 

VCVTDQ2PS ymmi, ymm2/m256 

RM 

V/V 

AVX 

Convert eight packed signed doubleword integers from 
ymm2/mem to eight packed single-precision floating¬ 
point values in ymmi. 

EVEX.128.0F.W0 5B/r 

VCVTDQ2PS xmmi [k1}{z}, 
xmm2/m128/m32bcst 

FV 

v/v 

AVX512VL 

AVX512F 

Convert four packed signed doubleword integers from 
xmm2/m128/m32bcst to four packed single-precision 
floating-point values in xmmi with writemask kl. 

EVEX.256.0F.W0 5B /r 
VCVTDQ2PSymm1 {k1}[z}, 
ymm2/m256/m32bcst 

FV 

v/v 

AVX512VL 

AVX512F 

Convert eight packed signed doubleword integers from 
ymm2/m256/m32bcst to eight packed single-precision 
floating-point values in ymmi with writemask kl. 

EVEX.512.0F.W0 5B/r 
VCVTDQ2PSzmm1 {k1}{z}, 
zmm2/m512/m32bcst{er} 

FV 

v/v 

AVX512F 

Convert sixteen packed signed doubleword integers 
from zmm2/m512/m32bcst to sixteen packed single¬ 
precision floating-point values in zmmi with writemask 
kl. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RM 

ModRM:reg (w) 

ModRM:r/m (r) 

NA 

NA 

FV 

ModRM:reg (w) 

ModRM:r/m (r) 

NA 

NA 


Description 

Converts four, eight or sixteen packed signed doubleword integers in the source operand to four, eight or sixteen 
packed single-precision floating-point values in the destination operand. 

EVEX encoded versions: The source operand can be a ZMM/YMM/XMM register, a 512/256/128-bit memory loca¬ 
tion or a 512/256/128-bit vector broadcasted from a 32-bit memory location. The destination operand is a 
ZMM/YMM/XMM register conditionally updated with writemask kl. 

VEX.256 encoded version: The source operand is a YMM register or 256- bit memory location. The destination 
operand is a YMM register. Bits (MAX_VL-1:256) of the corresponding register destination are zeroed. 

VEX.128 encoded version: The source operand is an XMM register or 128- bit memory location. The destination 
operand is a XMM register. The upper bits (MAX_VL-1:128) of the corresponding register destination are zeroed. 

128-bit Legacy SSE version: The source operand is an XMM register or 128- bit memory location. The destination 
operand is an XMM register. The upper Bits (MAX_VL-1:128) of the corresponding register destination are unmod¬ 
ified. 

VEX.vvvv and EVEX.vvvv are reserved and must be 1111b, otherwise instructions will #UD. 
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Operation 

VCVTDQZPS (EVEX encoded versions) when SRC operand is a register 

(KL, VL) = (4,1 28), (8, 256), (16, 512) 

IF(VL=512) AND (EVEX.b = 1) 

THEN 

SET_RM(E\/EX.RC); ; refer to Table 2-4 In the Inter Architecture Instruction Set Extensions Programming Reference 

ELSE 

SET_RM(MXCSR.RM); ; refer to Table 2-4 In the Inter Architecture Instruction Set Extensions Programming Reference 
FI; 

FOR) ^0 TO KL-1 
I ^j*32 

IF k10] OR *no writemask* 

THEN DEST[i+31:l] ^ 

Convert_lnteger_To_Slngle_Precislon_Floating_Point(SRC[l+31:!]) 

ELSE 

IF *merglng-masking* ; merging-masking 

THEN *DEST[i+31:i] remains unchanged* 

ELSE ; zeroing-masking 

DEST[i+31:i]^0 
FI 
FI; 

ENDFOR 

DEST[MAX_VL-1:VL]^0 

VCVTDQZPS (EVEX encoded versions) when SRC operand is a memory source 

(KL, VL) = (4,1 28), (8, 256), (16, 512) 

FOR) ^0 TO KL-1 
I * 32 

IF k10] OR *no writemask* 

THEN 

IF (EVEX.b = 1) 

THEN 

DEST[i+31:i] ^ 

Convert_lnteger_To_Single_Precision_Floating_Point(SRC[31:0]) 

ELSE 

DEST[i+31:i] ^ 

Convert_lnteger_To_Single_Precision_Floating_Point(SRC[i+31 :i]) 

FI; 

ELSE 

IF *merging-masking* ; merging-masking 

THEN *DEST[i+31:i] remains unchanged* 

ELSE ; zeroing-masking 

DEST[i+31:i]^0 
FI 
FI; 

ENDFOR 

DEST[MAX_VL-1:VL]^0 
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VCVTDQ2PS (VEX.256 encoded version) 

DEST[31:0] <- Convert_lnteger_To_Slngle_Preclslon_Floating_Polnt(SRC[31:0]) 

DEST[63:32] <- Convert_lnteger_To_Slngle_Preclslon_Floating_Polnt(SRC[63:32]) 

DEST[95:64] <- Convert_lnteger_To_Slngle_Preclslon_Floating_Polnt(SRC[95:64]) 

DEST[127:96] <- Convert_lnteger_To_Single_Preclslon_Floating_Polnt(SRC[127:96) 

DEST[159:128] <- Convert_lnteger_To_Slngle_Preclslon_Floatlng_Polnt(SRC[159:128]) 

DEST[191:160] <- Convert_lnteger_To_Slngle_Preclslon_Floatlng_Polnt(SRC[191:160]) 

DEST[223:192] <- Convert_lnteger_To_Slngle_Preclslon_Floatlng_Polnt(SRC[223:192]) 
DEST[255:224] <- Convert_lnteger_To_Slngle_Preclslon_Floatlng_Polnt(SRC[255:224) 
DEST[MAX_VL-1:256]^0 

VCVTDQ2PS (VEX.128 encoded version) 

DEST[31:0] <- Convert_lnteger_To_Single_Precision_Floating_Point(SRC[31:0]) 

DEST[63:32] <- Convert_lnteger_To_Single_Precision_Floating_Point(SRC[63:32]) 

DEST[95:64] <- Convert_lnteger_To_Single_Precision_Floating_Point(SRC[95:64]) 

DEST[127:96] <- Convert_lnteger_To_Single_Precision_Floating_Point(SRC[127z:96) 
DEST[MAX_VL-1:128]^0 

CVTDQ2PS (128-bit Legacy SSE version) 

DEST[31:0] <- Convert_lnteger_To_Single_Precision_Floating_Point(SRC[31:0]) 

DEST[63:32] <- Convert_lnteger_To_Single_Precision_Floating_Point(SRC[63:32]) 

DEST[95:64] <- Convert_lnteger_To_Single_Precision_Floating_Point(SRC[95:64]) 

DEST[127:96] <- Convert_lnteger_To_Single_Precision_Floating_Point(SRC[127z:96) 
DEST[MAX_VL-1:128] (unmodified) 

Intel C/C++ Compiler Intrinsic Equivalent 

VCVTDQ2PS _m512 _mm512_cvtepi32_ps(_m5121 a); 

\/C\/TDQ2PS_m512 _mm512_masl<_cvtepi32_ps(_m512 s,_mmaski 6 k,_m5121 a); 

\/C\/TDQ2PS_m512 _mm512_maskz_cvtepi32_ps(_mmaski 6 k,_m512i a); 

\/C\/TDQ2PS_m512 _mm512_cvt_roundepi32_ps(_m5121 a, int r); 

\/C\/TDQ2PS_m512 _mm512_mask_cvt_roundepi_ps(_m512 s,_mmaski 6 k,_m5121 a, int r); 

\/C\/TDQ2PS_m512 _mm512_maskz_cvt_roundepi32_ps(_mmaski 6 k,_m512i a, int r); 

\/C\/TDQ2PS_m256 _mm256_mask_cvtepi32_ps(_m256 s,_mmask8 k,_m256i a); 

\/C\/TDQ2PS_m256 _mm256_maskz_cvtepi32_ps(_mmask8 k,_m256i a); 

\/C\/TDQ2PS_ml 28 _mm_mask_cvtepi32_ps(_ml 28 s,_mmask8 k,_ml 28i a); 

\/C\/TDQ2PS_ml 28 _mm_maskz_cvtepi32_ps(_mmask8 k,_ml 28i a); 

C\/TDQ2PS_m256 _mm256_cvtepi32_ps (_m256i src) 

C\/TDQ2PS_ml 28 _mm_cvtepi32_ps (_ml 28i src) 

SIMD Floating-Point Exceptions 

Precision 

Other Exceptions 

VEX-encoded instructions, see Exceptions Type 2; 

EVEX-encoded instructions, see Exceptions Type E2. 

#UD If VEX.vvvv != llllB or EVEX.vvvv != llllB. 
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CVTPDZDQ—Convert Packed Double-Precision Floating-Point Values to Packed Doubleword 
Integers 


Opcode 

Instruction 

Op/ 

En 

64/32 
bit Mode 
Support 

CPUID 

Feature 

Flag 

Description 

F2 OF E6 /r 

CVTPD2DQ xmmi, xmm2/m128 

RM 

V/V 

SSE2 

Convert two packed double-precision floating-point 
values in xmm2/mem to two signed doubleword 
integers in xmmi. 

VEX.128.F2.0F.WIG E6 /r 

VCVTPD2DQ xmmi, xmm2/m128 

RM 

v/v 

AVX 

Convert two packed double-precision floating-point 
values in xmm2/mem to two signed doubleword 
integers in xmmi. 

VEX.256.F2.0F.WIG E6 /r 
VCVTPD2DQxmm1,ymm2/m256 

RM 

V/V 

AVX 

Convert four packed double-precision floating-point 
values in ymm2/mem to four signed doubleword 
integers in xmmi. 

EVEX.128.F2.0F.W1 E6/r 
VCVTPD2DQxmm1 {k1]{z}, 
xmm2/m128/m64bcst 

FV 

v/v 

AVX512VL 

AVX512F 

Convert two packed double-precision floating-point 
values in xmm2/m128/m64bcst to two signed 
doubleword integers in xmmi subject to writemask kl. 

EVEX.256.F2.0F.W1 E6 /r 
VCVTPD2DQxmm1 {k1]{z}, 
ymm2/m256/m64bcst 

FV 

v/v 

AVX512VL 

AVX512F 

Convert four packed double-precision floating-point 
values in ymm2/m256/m64bcst to four signed 
doubleword integers in xmmi subject to writemask kl. 

EVEX.512.F2.0F.W1 E6/r 
VCVTPD2DQymm1 {k1]{z}, 
zmm2/m512/m64bcst[er} 

FV 

v/v 

AVX512F 

Convert eight packed double-precision floating-point 
values in zmm2/m512/m64bcst to eight signed 
doubleword integers in ymmi subject to writemask kl. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RM 

ModRM:reg (w) 

ModRM:r/m (r) 

NA 

NA 

FV 

ModRM:reg (w) 

ModRM:r/m (r) 

NA 

NA 


Description 

Converts packed double-precision floating-point values in the source operand (second operand) to packed signed 
doubleword integers in the destination operand (first operand). 

When a conversion is inexact, the value returned is rounded according to the rounding control bits in the MXCSR 
register or the embedded rounding control bits. If a converted result cannot be represented in the destination 
format, the floating-point invalid exception is raised, and if this exception is masked, the indefinite integer value 
(Z”"^, where w represents the number of bits in the destination format) is returned. 

EVEX encoded versions: The source operand is a ZMM/YMM/XMM register, a 512-bit memory location, or a 512-bit 
vector broadcasted from a 64-bit memory location. The destination operand is a ZMM/YMM/XMM register condi¬ 
tionally updated with writemask kl. The upper bits (MAX_VL-l:256/128/64) of the corresponding destination are 
zeroed. 

VEX.256 encoded version: The source operand is a YMM register or 256- bit memory location. The destination 
operand is an XMM register. The upper bits (MAX_VL-1:128) of the corresponding ZMM register destination are 
zeroed. 

VEX.128 encoded version: The source operand is an XMM register or 128- bit memory location. The destination 
operand is a XMM register. The upper bits (MAX_VL-1:64) of the corresponding ZMM register destination are 
zeroed. 

128-bit Legacy SSE version: The source operand is an XMM register or 128- bit memory location. The destination 
operand is an XMM register. Bits[127:64] of the destination XMM register are zeroed. However, the upper bits 
(MAX_VL-1:128) of the corresponding ZMM register destination are unmodified. 

VEX.vvvv and EVEX.vvvv are reserved and must be 1111b, otherwise instructions will #UD. 
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Figure 3-12. VCVTPD2DQ (VEX.256 encoded version) 


Operation 

VCVTPDZDQ (EVEX encoded versions) when src operand is a register 

(KL, VL) = (2,128), (4, 256), (8, 512) 

IF(VL=512) AND (EVEX.b = 1) 

THEN 

SET_RM(EVEX.RC); 

ELSE 

SET_RM(MXCSR.RM); 


FOR] ^0 TO KL-1 
i^j*32 
k ^ j * 64 

IF k1 [j] OR *no writemask* 

THEN DEST[l+31:i] ^ 

Convert_Double_Precision_Floatlng_Polnt_To_lnteger(SRC[k+63:k]) 

ELSE 

IF *merglng-masklng* ; merglng-masklng 

THEN *DEST[I+31 :l] remains unchanged* 

ELSE ; zeroing-masking 

DEST[i+31:i]^0 
FI 
FI; 

ENDFOR 

DEST[MAX_VL-1:VL/2]^0 
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VCVTPDZDQ (EVEX encoded versions) when src operand is a memory source 

(KL, VL) = (2,128), (4, 256), (8, 512) 

FOR) ^0 TO KL-1 
i^j*32 
k ^ j * 64 

IF k10] OR *no wrltemask* 

THEN 

IF(EVEX.b= 1) 

THEN 

DEST[I+31:I] ^ 

Convert_Double_Preclslon_Floating_Point_To_lnteger(SRC[63:0]) 

ELSE 

DEST[I+31:I] ^ 

Convert_Double_Preclslon_Floating_Point_To_lnteger(SRC[k+63:k]) 

FI; 

ELSE 

IF *merglng-masking* ; merging-masking 

THEN *DEST[i+31:i] remains unchanged* 

ELSE ; zeroing-masking 

DEST[i+31:i]^0 
FI 
FI; 

ENDFOR 

DEST[MAX_VL-1:VL/2]^0 

VCVTPD2DQ (VEX.256 encoded version) 

DEST[31:0] <-Convert_Double_Precision_Floating_Point_To_lnteger(SRC[63:0]) 
DEST[63:32] <-Convert_Double_Precision_Floating_Point_To_lnteger(SRC[127:64]) 
DEST[95:64] <-Convert_Double_Precision_Floating_Point_To_lnteger(SRC[191:128]) 
DEST[127:96] <-Convert_Double_Precision_Floating_Point_To_lnteger(SRC[255:192) 
DEST[MAX_VL-1:128]^0 

VCVTPD2DQ (VEX.128 encoded version) 

DEST[31:0] <-Convert_Double_Precision_Floating_Point_To_lnteger(SRC[63:0]) 
DEST[63:32] <-Convert_Double_Precision_Floating_Point_To_lnteger(SRC[127:64]) 
DEST[MAX_VL-1:64]^0 

CVTPD2DQ (128-bit Legacy SSE version) 

DEST[31:0] <-Convert_Double_Precision_Floating_Point_To_lnteger(SRC[63:0]) 
DEST[63:32] <-Convert_Double_Precision_Floating_Point_To_lnteger(SRC[127:64]) 
DEST[127:64] ^0 
DEST[MAX_VL-1:128] (unmodified) 
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Intel C/C++ Compiler Intrinsic Equivalent 

VCVTPD2DQ_m256i_mm512_cvtpd_epl32(_m512d a); 

\/C\/TPD2DQ_m256i_mm512_mask_cvtpd_epl32(_m256l s,_mmaskS k,_m512d a); 

\/C\/TPD2DQ_m256i_mm512_maskz_cvtpd_epl32(_mmaskS k,_mSI 2d a); 

\/C\/TPD2DQ_m256i_mm512_cvt_roundpd_epi32(_mSI 2d a, Int r); 

\/C\/TPD2DQ_m256i_mm512_mask_cvt_roundpd_epl32(_m256l s,_mmaskS k,_mSI 2d a, int r); 

\/C\/TPD2DQ_m256l_mm512_maskz_cvt_roundpd_epl32(_mmaskS k,_mSI 2d a, Int r); 

\/C\/TPD2DQ_m12SI_mm256_mask_cvtpd_epi32(_m12SI s,_mmaskS k,_m256d a); 

\/C\/TPD2DQ_ml 2Si_mm256_maskz_cvtpd_epl32(_mmaskS k,_m256d a); 

\/C\/TPD2DQ_ml 2Si _mm_mask_cvtpd_epi32(_ml 2SI s,_mmaskS k,_ml 2Sd a); 

\/C\/TPD2DQ_m12SI_mm_maskz_cvtpd_epl32(_mmaskS k,_m12Sd a); 

VCVTPD2DQ _m12Si _mm256_cvtpd_epi32 (_m256d src) 

C\/TPD2DQ_ml 2Si _mm_cvtpd_epl32 (_ml 2Sd src) 

SIMD Floating-Point Exceptions 

Invalid, Precision 

Other Exceptions 

See Exceptions Type 2; additionally 
EVEX-encoded instructions, see Exceptions Type E2. 

#UD If VEX.vvvv != llllB or EVEX.vvvv != llllB. 
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CVTPDZPI—Convert Packed Double-Precision FP Values to Packed Dword Integers 


Opcode/ 

Op/ 

64-Bit 

Compat/ 

Description 

Instruction 

En 

Mode 

Leg Mode 


66 0F2D/r 

CVTPD2PI mm, xmm/m128 

RM 

Valid 

Valid 

Convert two packed double-precision floating¬ 
point values from xmm/ml28to two packed 
signed doubleword integers in mm. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RM 

ModRM:reg (w) 

ModRM:r/m (r) 

NA 

NA 


Description 

Converts two packed double-precision floating-point values in the source operand (second operand) to two packed 
signed doubleword integers in the destination operand (first operand). 

The source operand can be an XMM register or a 128-bit memory location. The destination operand is an MMX tech¬ 
nology register. 

When a conversion is inexact, the value returned is rounded according to the rounding control bits in the MXCSR 
register. If a converted result is larger than the maximum signed doubleword integer, the floating-point invalid 
exception is raised, and if this exception is masked, the indefinite integer value (80000000H) is returned. 

This instruction causes a transition from x87 FPU to MMX technology operation (that is, the x87 FPU top-of-stack 
pointer is set to 0 and the x87 FPU tag word is set to all Os [valid]). If this instruction is executed while an x87 FPU 
floating-point exception is pending, the exception is handled before the CVTPDZPI instruction is executed. 

In 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15). 

Operation 

DEST[31:0] <- Convert_Double_Precision_Floating_Point_To_lnteger32(SRC[63:0]); 

DEST[63:32] <- Convert_Double_Precision_Floating_Point_To_lnteger32(SRC[127:64]); 

Intel C/C++ Compiler Intrinsic Equivalent 

CVTPD1 PI: _m64_mm_cvtpd_pi32(_m128d a) 

SIMD Floating-Point Exceptions 

Invalid, Precision. 

Other Exceptions 

See Table 22-4, "Exception Conditions for Legacy SIMD/MMX Instructions with FP Exception and 16-Byte Align¬ 
ment," in the Intel® 64 and IA-32 Architectures Software Developer's Manual, Volume 3B. 
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CVTPDZPS—Convert Packed Double-Precision Floating-Point Values to Packed Single-Precision 
Floating-Point Values 


Opcode/ 

Instruction 

Op/ 

En 

64/32 
bit Mode 
Support 

CPUiO 

Feature 

Fiag 

Description 

66 OF 5A /r 

CVTPD2PS xmmi, xmmZ/ml 28 

RM 

V/V 

SSE2 

Convert two packed double-precision floating-point 
values in xmm2/mem to two single-precision 
floating-point values in xmmi. 

VEX.128.66.0F.WIG5A /r 

VCVTPD2PS xmmi, xmm2/m128 

RM 

v/v 

AVX 

Convert two packed double-precision floating-point 
values in xmm2/mem to two single-precision 
floating-point values in xmmi. 

VEX.256.66.0F.WIG 5A /r 

VCVTPD2PS xmm1,ymm2/m256 

RM 

V/V 

AVX 

Convert four packed double-precision floating-point 
values in ymm2/mem to four single-precision 
floating-point values in xmmi. 

EVEX.128.66.0F.W1 5A/r 

VCVTPD2PS xmmi [k1}[z], 
xmm2/m128/m64bcst 

FV 

v/v 

AVX512VL 

AVX512F 

Convert two packed double-precision floating-point 
values in xmm2/m128/m64bcst to two single- 
precision floating-point values in xmmi with 
writemask kl. 

EVEX.256.66.0F.W1 5A /r 

VCVTPD2PS xmmi [k1}[z], 
ymm2/m256/m64bcst 

FV 

v/v 

AVX512VL 

AVX512F 

Convert four packed double-precision floating-point 
values in ymm2/m256/m64bcst to four single¬ 
precision floating-point values in xmmi with 
writemask kl. 

EVEX.512.66.0F.W1 5A/r 

VCVTPD2PS ymmi {k1}{z}, 
zmm2/m512/m64bcst{er} 

FV 

v/v 

AVX512F 

Convert eight packed double-precision floating-point 
values in zmm2/m512/m64bcst to eight single- 
precision floating-point values in ymmi with 
writemask kl. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RM 

ModRM:reg (w) 

ModRM:r/m (r) 

NA 

NA 

FV 

ModRM:reg (w) 

ModRM:r/m (r) 

NA 

NA 


Description 

Converts two, four or eight packed double-precision floating-point values in the source operand (second operand) 
to two, four or eight packed single-precision floating-point values in the destination operand (first operand). 

When a conversion is inexact, the value returned is rounded according to the rounding control bits in the MXCSR 
register or the embedded rounding control bits. 

EVEX encoded versions: The source operand is a ZMM/YMM/XMM register, a 512/256/128-bit memory location, or 
a 512/256/128-bit vector broadcasted from a 64-bit memory location. The destination operand is a 
YMM/XMM/XMM (low 64-bits) register conditionally updated with writemask kl. The upper bits (MAX_VL- 
1:256/128/64) of the corresponding destination are zeroed. 

VEX.256 encoded version: The source operand is a YMM register or 256- bit memory location. The destination 
operand is an XMM register. The upper bits (MAX_VL-1:128) of the corresponding ZMM register destination are 
zeroed. 

VEX.128 encoded version: The source operand is an XMM register or 128- bit memory location. The destination 
operand is a XMM register. The upper bits (MAX_VL-1:64) of the corresponding ZMM register destination are 
zeroed. 

128-bit Legacy SSE version: The source operand is an XMM register or 128- bit memory location. The destination 
operand is an XMM register. Bits[127:64] of the destination XMM register are zeroed. However, the upper Bits 
(MAX_VL-1:128) of the corresponding ZMM register destination are unmodified. 

VEX.vvvv and EVEX.vvvv are reserved and must be 1111b otherwise instructions will #UD. 
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Figure 3-13. VCVTPD2PS (VEX.256 encoded version) 


Operation 

VCVTPDZPS (EVEX encoded version) when src operand is a register 

(KL, VL) = (2,128), (4, 256), (8, 512) 

IF(VL = 512) AND (EVEX.b = 1) 

THEN 

SET_RM(EVEX.RC); 

ELSE 

SET_RM(MXCSR.RM); 


FOR) ^0 TO KL-1 
i^j*32 
k ^ j * 64 

IF k10] OR *no wrltemask* 

THEN 

DEST[I+31 :l] <- Convert_Double_Preclslon_Floatlng_Polnt_To_Single_Preclsion_Floatlng_Polnt(SRC[k+63:k]) 

ELSE 

IF *merglng-masking* ; merging-masking 

THEN *DEST[i+31:i] remains unchanged* 

ELSE ; zeroing-masking 

DEST[i+31:i]^0 
FI 
FI; 

ENDFOR 

DEST[MAX_VL-1:VL/2]^0 
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VCVTPDZPS (EVEX encoded version) when src operand is a memory source 

(KL, VL) = (2,128), (4, 256), (8, 512) 

FOR] ^0 TO KL-1 
i^j*32 
k ^ j * 64 

IF k1 [j] OR *no writemask* 

THEN 

IF (EVEX.b = 1) 

THEN 

DEST[i+31:i] <-Convert_Double_Preclslon_Floatlng_Polnt_To_Single_Preclsion_Floatlng_Polnt(SRC[63:0]) 

ELSE 

DEST[i+31:l] <- Convert_Double_Precislon_Floating_Polnt_To_Single_Preclsion_Floatlng_Point(SRC[k+63:k]) 
FI; 

ELSE 

IF *merglng-masklng* ; merglng-masklng 

THEN *DEST[I+31 :l] remains unchanged* 

ELSE ; zeroing-masking 

DEST[i+31:i]^0 
FI 
FI; 

ENDFOR 

DEST[MAX_VL-1:VL/2]^0 

VCVTPDZPS (VEX.256 encoded version) 

DEST[31:0] <- Convert_Double_Precision_To_Single_Precision_Floating_Point(SRC[63:0]) 

DEST[63:32] <- Convert_Double_Precision_To_Single_Precision_Floating_Point(SRC[127:64]) 

DEST[95:64] <- Convert_Double_Precision_To_Single_Precision_Floating_Point(SRC[191:128]) 

DEST[127:96] <- Convert_Double_Precision_To_Single_Precision_Floating_Point(SRC[255:192) 

DEST[MAX_VL-1:128]^0 

VCVTPDZPS (VEX.128 encoded version) 

DEST[31:0] <- Convert_Double_Precision_To_Single_Precision_Floating_Point(SRC[63:0]) 

DEST[63:32] <- Convert_Double_Precision_To_Single_Precision_Floating_Point(SRC[127:64]) 

DEST[MAX_VL-1:64]^0 

CVTPD2PS (128-bit Legacy SSE version) 

DEST[31:0] <- Convert_Double_Precision_To_Single_Precision_Floating_Point(SRC[63:0]) 

DEST[63:32] <- Convert_Double_Precision_To_Single_Precision_Floating_Point(SRC[127:64]) 

DEST[127:64] ^ 0 
DEST[MAX_VL-1:128] (unmodified) 
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Intel C/C++ Compiler Intrinsic Equivalent 

VCVTPD2PS _m256 _mm512_cutpd_ps( _m512d a); 

\/C\/TPD2PS_m256 _mm512_mask_cvtpd_ps(_m256 s,_mmaskS k,_mSI 2d a); 

\/C\/TPD2PS_m256_mm512_maskz_cvtpd_ps(_mmaskS k,_mSI 2d a); 

\/C\/TPD2PS_m256_mm512_cvt_roundpd_ps(_mSI 2d a, Int r); 

\/C\/TPD2PS_m256 _mm512_mask_cvt_roundpd_ps(_m256 s,_mmaskS k,_mSI 2d a, Int r); 

\/C\/TPD2PS_m256_mm512_maskz_cvt_roundpd_ps(_mmaskS k,_mSI 2d a, Int r); 

\/C\/TPD2PS_ml 28 _mm256_mask_cvtpd_ps(_ml 28 s,_mmaskS k,_m256d a); 

\/C\/TPD2PS_ml 28 _mm256_maskz_cvtpd_ps(_mmaskS k,_m256d a); 

\/C\/TPD2PS_ml 28 _mm_mask_cvtpd_ps(_ml 28 s,_mmaskS k,_ml 28d a); 

\/C\/TPD2PS_ml 28 _mm_maskz_cvtpd_ps(_mmaskS k,_ml 28d a); 

VCVTPD2PS _m128 _mm256_cutpd_ps (_m256d a) 

C\/TPD2PS_ml 28 _mm_cvtpd_ps (_ml 28d a) 

SIMD Floating-Point Exceptions 

Invalid, Precision, Underflow, Overflow, Denormal 

Other Exceptions 

VEX-encoded instructions, see Exceptions Type 2; 

EVEX-encoded instructions, see Exceptions Type E2. 

#UD If VEX.vvvv != llllB or EVEX.vvvv != llllB. 


CVTPDZPS—Convert Packed Double-Precision Floating-Point Values to Packed Single-Precision Floating-Point Values 


Vol.2A 3-243 


INSTRUCTION SET REFERENCE, A-L 


CVTPIZPD—Convert Packed Dword Integers to Packed Double-Precision FP Values 


Opcode/ 

Instruction 

Op/ 

En 

64-Bit 

Mode 

Compat/ 
Leg Mode 

Description 

66 OF 2A/r 

CVTPI2PD xmm, mm/m64* 

RM 

Valid 

Valid 

Convert two packed signed doubleword 
integers from mm/mem64 to two packed 
double-precision floating-point values in xmm. 


NOTES: 


*Operatlon Is different for different operand sets; see the Description section. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RM 

ModRM:reg (w) 

ModRM:r/m (r) 

NA 

NA 


Description 

Converts two packed signed doubleword integers in the source operand (second operand) to two packed double¬ 
precision floating-point values in the destination operand (first operand). 

The source operand can be an MMX technology register or a 64-bit memory location. The destination operand is an 
XMM register. In addition, depending on the operand configuration: 

• For operands xmm, mm: the instruction causes a transition from x87 FPU to MMX technology operation (that 
is, the x87 FPU top-of-stack pointer is set to 0 and the x87 FPU tag word is set to all Os [valid]). If this 
instruction is executed while an x87 FPU floating-point exception is pending, the exception is handled before 
the CVTPI2PD instruction is executed. 

• For operands xmm, m64: the instruction does not cause a transition to MMX technology and does not take 
x87 FPU exceptions. 

In 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15). 

Operation 

DEST[63:0] Convert_lnteger_To_Double_Precision_Floating_Point(SRC[31:0]); 

DEST[127:64] <- Convert_lnteger_To_Double_Precision_Floating_Point(SRC[63:32]); 

Intel C/C++ Compiler Intrinsic Equivalent 

CVTPI2PD: _m128d _mm_cvtpi32_pd(_m64 a) 

SIMD Floating-Point Exceptions 

None 

Other Exceptions 

See Table 22-6, "Exception Conditions for Legacy SIMD/MMX Instructions with XMM and without FP Exception," in 
the Intel® 64 and IA-32 Architectures Software Developer's Manual, Volume 3B. 
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CVTPIZPS—Convert Packed Dword Integers to Packed Single-Precision FP Values 


Opcode/ 

Instruction 

Op/ 

En 

64-Bit 

Mode 

Compat/ 
Leg Mode 

Description 

OF 2A /r 

CVTPI2PS xmm, mm/m64 

RM 

Valid 

Valid 

Convert two signed doubleword integers 
from mm/m64 to two single-precision 
floating-point values in xmm. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RM 

ModRM:reg (w) 

ModRM:r/m (r) 

NA 

NA 


Description 

Converts two packed signed doubleword integers in the source operand (second operand) to two packed single¬ 
precision floating-point values in the destination operand (first operand). 

The source operand can be an MMX technology register or a 64-bit memory location. The destination operand is an 
XMM register. The results are stored in the low quadword of the destination operand, and the high quadword 
remains unchanged. When a conversion is inexact, the value returned is rounded according to the rounding control 
bits in the MXCSR register. 

This instruction causes a transition from x87 FPU to MMX technology operation (that is, the x87 FPU top-of-stack 
pointer is set to 0 and the x87 FPU tag word is set to all Os [valid]). If this instruction is executed while an x87 FPU 
floating-point exception is pending, the exception is handled before the CVTPI2PS instruction is executed. 

In 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15). 

Operation 

DEST[31:0] ^ Convert_lnteger_To_Single_Precision_Floating_Point(SRC[31:0]); 

DEST[63:32] ^ Convert_lnteger_To_Single_Precision_Floating_Point(SRC[63:32]); 

(* Fligh quadword of destination unchanged *) 

Intel C/C++ Compiler Intrinsic Equivalent 

CVTPI2PS: _m128 _mm_cvtpi32_ps(_m128 a, _m64 b) 

SIMD Floating-Point Exceptions 

Precision 

Other Exceptions 

See Table 22-5, "Exception Conditions for Legacy SIMD/MMX Instructions with XMM and FP Exception," in the 
Intel® 64 and IA-32 Architectures Software Developer's Manual, Volume 3B. 
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CVTPSZDQ—Convert Packed Single-Precision Floating-Point Values to Packed Signed 
Doubleword Integer Values 


Opcode/ 

Instruction 

Op/ 

En 

64/32 
bit Mode 
Support 

CPUID 

Feature 

Fiag 

Description 

66 OF 5B /r 

CVTPS2DQ xmm 1, xmm2/m 128 

RM 

V/V 

SSE2 

Convert four packed single-precision floating-point values 
from xmm2/mem to four packed signed doubleword 
values in xmmi. 

VEX.128.66.0F.WIG5B/r 

VCVTPS2DQ xmmi, xmm2/m128 

RM 

v/v 

AVX 

Convert four packed single-precision floating-point values 
from xmm2/mem to four packed signed doubleword 
values in xmmi. 

VEX.256.66.0F.WIG5B/r 

VCVTPS2DQ ymmi, ynnm2/m256 

RM 

V/V 

AVX 

Convert eight packed single-precision floating-point values 
from ymm2/mem to eight packed signed doubleword 
values in ymmi. 

EVEX.128.66.0F.W0 5B/r 
VCVTPS2DQxmm1 [k1}{z}, 
xmm2/m128/m32bcst 

FV 

v/v 

AVX512VL 

AVX512F 

Convert four packed single precision floating-point values 
from xmm2/m128/m32bcst to four packed signed 
doubleword values in xmmi subject to writemask kl. 

EVEX.256.66.0F.W0 5B /r 
VCVTPS2DQymm1 {k1}{z}, 
ymm2/m256/m32bcst 

FV 

v/v 

AVX512VL 

AVX512F 

Convert eight packed single precision floating-point values 
from ymm2/m256/m32bcst to eight packed signed 
doubleword values in ymmi subject to writemask kl. 

EVEX.512.66.0F.W0 5B/r 
VCVTPS2DQzmm1 {k1}{z}, 
zmm2/m512/m32bcst{er} 

FV 

v/v 

AVX512F 

Convert sixteen packed single-precision floating-point 
values from zmm2/m512/m32bcst to sixteen packed 
signed doubleword values in zmmi subject to writemask 
kl. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RM 

ModRM:reg (w) 

ModRM:r/m (r) 

NA 

NA 

FV 

ModRM:reg (w) 

ModRM:r/m (r) 

NA 

NA 


Description 

Converts four, eight or sixteen packed single-precision floating-point values in the source operand to four, eight or 
sixteen signed doubleword integers in the destination operand. 

When a conversion is inexact, the value returned is rounded according to the rounding control bits in the MXCSR 
register or the embedded rounding control bits. If a converted result cannot be represented in the destination 
format, the floating-point invalid exception is raised, and if this exception is masked, the indefinite integer value 
(2™"^, where w represents the number of bits in the destination format) is returned. 

EVEX encoded versions: The source operand is a ZMM register, a 512-bit memory location or a 512-bit vector 
broadcasted from a 32-bit memory location. The destination operand is a ZMM register conditionally updated with 
writemask kl. 

VEX.256 encoded version: The source operand is a VMM register or 256- bit memory location. The destination 
operand is a VMM register. The upper bits (MAX_VL-1:256) of the corresponding ZMM register destination are 
zeroed. 

VEX.128 encoded version: The source operand is an XMM register or 128- bit memory location. The destination 
operand is a XMM register. The upper bits (MAX_VL-1:128) of the corresponding ZMM register destination are 
zeroed. 

128-bit Legacy SSE version: The source operand is an XMM register or 128- bit memory location. The destination 
operand is an XMM register. The upper bits (MAX_VL-1:128) of the corresponding ZMM register destination are 
unmodified. 

VEX.vvvv and EVEX.vvvv are reserved and must be 1111b otherwise instructions will #UD. 
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Operation 

VCVTPSZDQ (encoded versions) when src operand is a register 

(KL, VL) = (4,1 28), (8, 256), (16, 512) 

IF(VL=512) AND (EVEX.b = 1) 

THEN 

SET_RM(EVEX.RC); 

ELSE 

SET_RM(MXCSR.RM); 

FI; 

FORj^OTO KL-1 
I ^j*32 

IF k10] OR *no writemask* 

THEN DEST[i+31:l] ^ 

Convert_Slngle_Preclsion_Floatlng_Polnt_To_lnteger(SRC[i+31:i]) 

ELSE 

IF *merglng-masking* ; merging-masking 

THEN *DEST[i+31:i] remains unchanged* 

ELSE ; zeroing-masking 

DEST[i+31:i]^0 
FI 
FI; 

ENDFOR 

DEST[MAX_VL-1:VL]^0 

VCVTPSZDQ (EVEX encoded versions) when src operand is a memory source 

(KL, VL) = (4,1 28), (8, 256), (16, 512) 

FORj^0TO15 
I ^j*32 

IF k10] OR *no writemask* 

THEN 

IF (EVEX.b = 1) 

THEN 

DEST[i+31:i] ^ 

Convert_Single_Precision_Floating_Point_To_lnteger(SRC[31:0]) 

ELSE 

DEST[i+31:i] ^ 

Convert_Single_Precision_Floating_Point_To_lnteger(SRC[i+31:i]) 

FI; 

ELSE 

IF *merging-masking* ; merging-masking 

THEN *DEST[i+31:i] remains unchanged* 

ELSE ; zeroing-masking 

DEST[i+31:i]^0 
FI 
FI; 

ENDFOR 

DEST[MAX_VL-1:VL]^0 
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VCVTPS2DQ (VEX.256 encoded version) 

DEST[31:0] <-Convert_Slngle_Preclslon_Floating_Polnt_To_lnteger(SRC[31:0]) 

DEST[63:32] <-Convert_Slngle_Preclslon_Floatlng_Point_To_lnteger(SRC[63:32]) 

DEST[95:64] <-Convert_Slngle_Preclslon_Floatlng_Point_To_lnteger(SRC[95:64]) 

DEST[127:96] <-Convert_Single_Preclsion_Floatlng_Point_To_lnteger(SRC[127:96) 

DEST[159:128] <-Convert_Single_Preclsion_Floatlng_Point_To_lnteger(SRC[159:128]) 

DEST[191:160] <-Convert_Single_Preclsion_Floatlng_Point_To_lnteger(SRC[191:160]) 

DEST[223:192] <-Convert_Single_Preclsion_Floatlng_Point_To_lnteger(SRC[223:192]) 

DEST[255:224] <-Convert_Single_Preclsion_Floatlng_Point_To_lnteger(SRC[255:224]) 

VCVTPS2DQ (VEX.128 encoded version) 

DEST[31:0] <-Convert_Single_Precision_Floating_Point_To_lnteger(SRC[31:0]) 

DEST[63:32] <-Convert_Single_Precision_Floating_Point_To_lnteger(SRC[63:32]) 

DEST[95:64] <-Convert_Single_Precision_Floating_Point_To_lnteger(SRC[95:64]) 

DEST[127:96] <-Convert_Single_Precision_Floating_Point_To_lnteger(SRC[127:96]) 
DEST[MAX_VL-1:128] ^0 

CVTPS2DQ (128-bit Legacy SSE version) 

DEST[31:0] <-Convert_Single_Precision_Floating_Point_To_lnteger(SRC[31:0]) 

DEST[63:32] <-Convert_Single_Precision_Floating_Point_To_lnteger(SRC[63:32]) 

DEST[95:64] <-Convert_Single_Precision_Floating_Point_To_lnteger(SRC[95:64]) 

DEST[127:96] <-Convert_Single_Precision_Floating_Point_To_lnteger(SRC[127:96]) 

DEST[MAX_VL-1:128] (unmodified) 

Intel C/C++ Compiler Intrinsic Equivalent 

VCVTPS2DQ_m5121 _mm512_cvtps_epi32(_m512 a); 

\/C\/TPS2DQ_m5121 _mm512_mask_cvtps_epi32(_m5121 s,_mmasl<16 k,_m512 a); 

\/C\/TPS2DQ_m5121 _mm512_maskz_cvtps_epi32(_mmask16 k,_m512 a); 

\/C\/TPS2DQ_m5121 _mm512_cvt_roundps_epi32(_m512 a, int r); 

\/C\/TPS2DQ_m5121 _mm512_mask_cvt_roundps_epi32(_m512i s,_mmask16 k,_m512 a, int r); 

\/C\/TPS2DQ_m512i _mm512_maskz_cvt_roundps_epi32(_mmaski 6 k,_m512 a, int r); 

\/C\/TPS2DQ_m256i _mm256_mask_cvtps_epi32(_m256i s,_mmask8 k,_m256 a); 

\/C\/TPS2DQ_m256i _mm256_maskz_cvtps_epi32(_mmask8 k,_m256 a); 

\/C\/TPS2DQ_ml 28i _mm_mask_cvtps_epi32(_ml 28i s,_mmask8 k,_ml 28 a); 

\/C\/TPS2DQ_ml 28i _mm_maskz_cvtps_epi32(_mmask8 k,_ml 28 a); 

VCVTPS2DQ_m256i_mm256_cvtps_epi32 (_m256 a) 

C\/TPS2DQ_m128i_mm_cvtps_epi32 (_ml 28 a) 

SIMD Floating-Point Exceptions 

Invalid, Precision 

Other Exceptions 

VEX-encoded instructions, see Exceptions Type 2; 

EVEX-encoded instructions, see Exceptions Type E2. 

#UD If VEX.vvvv != llllB or EVEX.vvvv != llllB. 
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CVTPSZPD—Convert Packed Single-Precision Floating-Point Values to Packed Double-Precision 
Floating-Point Values 


Opcode/ 

Instruction 

Op/ 

En 

64/32 
bit Mode 
Support 

CPUID 

Feature 

Flag 

Description 

OF 5A /r 

CVTPS2PD xmmi, xmm2/m64 

RM 

V/V 

SSE2 

Convert two packed single-precision floating-point values in 
xmm2/m64 to two packed double-precision floating-point 
values in xmmi. 

VEX.128.0F.WIG 5A/r 

VCVTPS2PD xmmi, xmm2/m64 

RM 

v/v 

AVX 

Convert two packed single-precision floating-point values in 
xmm2/m64 to two packed double-precision floating-point 
values in xmmi. 

VEX.256.0F.WIG 5A /r 

VCVTPS2PD ymmi, xmmZ/ml 28 

RM 

V/V 

AVX 

Convert four packed single-precision floating-point values 
in xmm2/m128 to four packed double-precision floating¬ 
point values in ymmi. 

EVEX.128.0F.W0 5A/r 

VCVTPS2PD xmmi {k1]{z}, 
xmm2/m64/m32bcst 

HV 

v/v 

AVX512VL 

AVX512F 

Convert two packed single-precision floating-point values in 
xmm2/m64/m32bcst to packed double-precision floating¬ 
point values in xmmi with writemask kl. 

EVEX.256.0F.W0 5A /r 
VCVTPSZPDymmI {k1]{z}, 
xmmZ/ml 28/m32bcst 

HV 

v/v 

AVX512VL 

Convert four packed single-precision floating-point values 
in xmm2/m128/m32bcst to packed double-precision 
floating-point values in ymmi with writemask kl. 

EVEX.512.0F.W0 5A/r 

VCVTPS2PD zmmi {k1}[z}, 
ymm2/m256/m32bcst[sae} 

HV 

v/v 

AVX512F 

Convert eight packed single-precision floating-point values 
in ymm2/m256/b32bcst to eight packed double-precision 
floating-point values in zmmi with writemask kl. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RM 

ModRM:reg (w) 

ModRM:r/m (r) 

NA 

NA 

HV 

ModRM:reg (w) 

ModRM:r/m (r) 

NA 

NA 


Description 

Converts two, four or eight packed single-precision floating-point values in the source operand (second operand) 
to two, four or eight packed double-precision floating-point values in the destination operand (first operand). 

EVEX encoded versions: The source operand is a YMM/XMM/XMM (low 64-bits) register, a 256/128/64-bit memory 
location or a 256/128/64-bit vector broadcasted from a 32-bit memory location. The destination operand is a 
ZMM/YMM/XMM register conditionally updated with writemask kl. 

VEX.256 encoded version: The source operand is an XMM register or 128- bit memory location. The destination 
operand is a YMM register. Bits (MAX_VL-1:256) of the corresponding destination ZMM register are zeroed. 

VEX.128 encoded version: The source operand is an XMM register or 64- bit memory location. The destination 
operand is a XMM register. The upper Bits (MAX_VL-1:128) of the corresponding ZMM register destination are 
zeroed. 

128-bit Legacy SSE version: The source operand is an XMM register or 64- bit memory location. The destination 
operand is an XMM register. The upper Bits (MAX_VL-1:128) of the corresponding ZMM register destination are 
unmodified. 

Note: VEX.vvvv and EVEX.vvvv are reserved and must be 1111b otherwise instructions will #UD. 
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Figure 3-14. CVTPS2PD (VEX.256 encoded version) 


Operation 

VCVTPSZPD (EVEX encoded versions) when src operand is a register 

(KL, VL) = (2,128), (4, 256), (8, 512) 

FOR] ^0 TO KL-1 
i ^ j * 64 
k^j*32 

IF k1 [j] OR *no writemask* 

THEN DEST[l+63:i] ^ 

Convert_Single_Precision_To_Double_Precislon_Floatlng_Polnt(SRC[k+31:k]) 

ELSE 

IF *merglng-masklng* ; merglng-masklng 

THEN *DEST[I+63:I] remains unchanged* 

ELSE ; zeroing-masking 

DEST[i+63:i] ^ 0 
FI 
FI; 

ENDFOR 

DEST[MAX_VL-1 :VL] ^ 0 

\/C\/TPS2PD (EVEX encoded versions) when src operand is a memory source 

(KL, VL) = (2,128), (4, 256), (8, 512) 

FOR] ^0 TO KL-1 
i ^ j * 64 
k^j*32 

IF k1 [j] OR *no writemask* 

THEN 

IF (EVEX.b = 1) 

THEN 

DEST[I+63:I] ^ 

Convert_Single_Precislon_To_Double_Preclsion_Floatlng_Point(SRC[31:0]) 

ELSE 

DEST[I+63:I] ^ 

Convert_Single_Precision_To_Double_Precislon_Floatlng_Polnt(SRC[k+31:k]) 

FI; 

ELSE 
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IF *merglng-masking* ; merging-masking 

TFIEN *DEST[i+63:i] remains unchanged* 

ELSE ; zeroIng-maskIng 

DEST[I+63:I] ^ 0 
FI 
FI; 

ENDFOR 

DEST[MAX_VL-1:VL]^0 

VCVTPS2PD (VEX.256 encoded version) 

DEST[63:0] <- Convert_Single_Precision_To_Double_Precision_Floating_Point(SRC[31:0]) 

DEST[127:64] <- Convert_Single_Precision_To_Double_Precision_Floating_Point(SRC[63:32]) 

DEST[191:128] <- Convert_Single_Precision_To_Double_Precision_Floating_Point(SRC[95:64]) 
DEST[255:192] <- Convert_Single_Precision_To_Double_Precision_Floating_Point(SRC[127:96) 
DEST[MAX_VL-1:256]^0 

VCVTPS2PD (VEX.128 encoded version) 

DEST[63:0] <- Convert_Single_Precision_To_Double_Precision_Floating_Point(SRC[31:0]) 

DEST[127:64] <- Convert_Single_Precision_To_Double_Precision_Floating_Point(SRC[63:32]) 
DEST[MAX_VL-1:128]^0 

CVTPS2PD (128-bit Legacy SSE version) 

DEST[63:0] <- Convert_Single_Precision_To_Double_Precision_Floating_Point(SRC[31:0]) 

DEST[127:64] <- Convert_Single_Precision_To_Double_Precision_Floating_Point(SRC[63:32]) 
DEST[MAX_VL-1:128] (unmodified) 

Intel C/C++ Compiler Intrinsic Equivalent 

VCVTPS2PD _m512d _mm512_cvtps_pd( _m256 a); 

\/CVTPS2PD m512d_mm512_mask_cvtps_pd( m512d s, mmask8 k, m256 a); 

VCVTPSZPD_mSI 2d _mm512_maskz_cvtps_pd(_mmask8 k,_m256 a); 

\/CVTPS2PD_mSI 2d _mm512_cvt_roundps_pd(_m256 a, int sae); 

\/CVTPS2PD_mSI 2d_mm512_mask_cvt_roundps_pd(_mSI 2d s,_mmask8 k,_m256 a, int sae); 

\/CVTPS2PD_mSI 2d _mm512_maskz_cvt_roundps_pd(_mmask8 k,_m256 a, int sae); 

\/CVTPS2PD m256d _mm256_mask_cvtps_pd( m256d s, mmask8 k, ml 28 a); 

VCVTPSZPD_m256d _mm256_maskz_cvtps_pd(_mmask8 k,_ml 28a); 

VCVTPSZPD_ml 28d _mm_mask_cvtps_pd(_ml 28d s,_mmask8 k,_ml 28 a); 

VCVTPSZPD_ml 28d _mm_maskz_cvtps_pd(_mmask8 k,_ml 28 a); 

VCVTPS2PD _m256d _mm256_cvtps_pd (_m128 a) 

CVTPSZPD_ml 28d _mm_cvtps_pd (_ml 28 a) 

SIMD Floating-Point Exceptions 

Invalid, Denormal 

Other Exceptions 

VEX-encoded instructions, see Exceptions Type 3; 

EVEX-encoded instructions, see Exceptions Type E3. 

#UD If VEX.vvvv != llllB or EVEX.vvvv != llllB. 
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CVTPSZPI—Convert Packed Single-Precision FP Values to Packed Dword Integers 


Opcode/ 

Instruction 

Op/ 

En 

64-Bit 

Mode 

Compat/ 
Leg Mode 

Description 

OF 2D /r 

CVTPS2PI mm, xmm/m64 

RM 

Valid 

Valid 

Convert two packed single-precision floating¬ 
point values from xmm/m64 to two packed 
signed doubleword integers in mm. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RM 

ModRM:reg (w) 

ModRM:r/m (r) 

NA 

NA 


Description 

Converts two packed single-precision floating-point values in the source operand (second operand) to two packed 
signed doubleword integers in the destination operand (first operand). 

The source operand can be an XMM register or a 128-bit memory location. The destination operand is an MMX tech¬ 
nology register. When the source operand is an XMM register, the two single-precision floating-point values are 
contained in the low quadword of the register. When a conversion is inexact, the value returned is rounded 
according to the rounding control bits in the MXCSR register. If a converted result is larger than the maximum 
signed doubleword integer, the floating-point invalid exception is raised, and if this exception is masked, the indef¬ 
inite integer value (80000000H) is returned. 

CVTPS2PI causes a transition from x87 FPU to MMX technology operation (that is, the x87 FPU top-of-stack pointer 
is set to 0 and the x87 FPU tag word is set to all Os [valid]). If this instruction is executed while an x87 FPU floating¬ 
point exception is pending, the exception is handled before the CVTPS2PI instruction is executed. 

In 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15). 

Operation 

DEST[31:0] Convert_Single_Precision_Floating_Point_To_lnteger(SRC[31:0]); 

DEST[63:32] ^ Convert_Single_Precision_Floating_Point_To_lnteger(SRC[63:32]); 

Intel C/C++ Compiler Intrinsic Equivalent 

CVTPSZPI: _m64 _mm_cvtps_pi32(_ml 28 a) 

SIMD Floating-Point Exceptions 

Invalid, Precision 

Other Exceptions 

See Table 22-5, "Exception Conditions for Legacy SIMD/MMX Instructions with XMM and FP Exception," in the 
Intel® 64 and IA-32 Architectures Software Developer's Manual, Volume 3B. 
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CVTSDZSI—Convert Scalar Double-Precision Floating-Point Value to Doubleword Integer 


Opcode/ 

Instruction 

Op/ 

En 

64/32 
bit Mode 
Support 

CPUID 

Feature 

Flag 

Description 

F2 OF 2D /r 

CVTSD2SI r32, xmm1/m64 

RM 

V/V 

SSE2 

Convert one double-precision floating-point value from 
xmm1/m64 to one signed doubleword integer r32. 

F2 REX.W OF 2D /r 

CVTSD2SI r64, xmm1/m64 

RM 

V/N.E. 

SSE2 

Convert one double-precision floating-point value from 
xmm1/m64 to one signed quadword integer sign- 
extended into r64. 

VEX.128.F2.0F.W0 2D/r 

VCVTSD2SI r32, xmm1/m64 

RM 

V/V 

AVX 

Convert one double-precision floating-point value from 
xmm1/m64 to one signed doubleword integer r32. 

VEX.128.F2.0F.W1 2D/r 
VCVTSD2Slr64,xmm1/m64 

RM 

V/N.E.' 

AVX 

Convert one double-precision floating-point value from 
xmm1/m64 to one signed quadword integer sign- 
extended into r64. 

EVEX.LIG.F2.0F.W0 2D /r 

VCVTSD2SI r32, xmmi /m64{er} 

T1F 

V/V 

AVX512F 

Convert one double-precision floating-point value from 
xmm1/m64 to one signed doubleword integer r32. 

EVEX.LIG.F2.0F.W1 2D /r 

VCVTSD2SI r64, xmmi /m64{er} 

T1F 

V/N.E.' 

AVX512F 

Convert one double-precision floating-point value from 
xmm1/m64 to one signed quadword integer sign- 
extended into r64. 


NOTES: 

1. VEX.Wl/EVEX.Wl in non-64 bit is ignored; the instructions behaves as if the WO version is used. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RM 

ModRM:reg (w) 

ModRM:r/m (r) 

NA 

NA 

T1F 

ModRM:reg (w) 

ModRM:r/m (r) 

NA 

NA 


Description 

Converts a double-precision floating-point value in the source operand (the second operand) to a signed double- 
word integer in the destination operand (first operand). The source operand can be an XMM register or a 64-bit 
memory location. The destination operand is a general-purpose register. When the source operand is an XMM 
register, the double-precision floating-point value is contained in the low quadword of the register. 

When a conversion is inexact, the value returned is rounded according to the rounding control bits in the MXCSR 
register. 

If a converted result exceeds the range limits of signed doubleword integer (in non-64-bit modes or 64-bit mode 
with REX.W/VEX.W/EVEX.W=0), the floating-point invalid exception is raised, and if this exception is masked, the 
indefinite integer value (80000000H) is returned. 

If a converted result exceeds the range limits of signed quadword integer (in 64-bit mode and 
REX.W/VEX.W/EVEX.W = 1), the floating-point invalid exception is raised, and if this exception is masked, the 
indefinite integer value (80000000_00000000H) is returned. 

Legacy SSE instruction: Use of the REX.W prefix promotes the instruction to produce 64-bit data in 64-bit mode. 
See the summary chart at the beginning of this section for encoding data and limits. 

Note: VEX.vvvv and EVEX.vvvv are reserved and must be 1111b, otherwise instructions will #UD. 

Software should ensure VCVTSD2SI is encoded with VEX.L=0. Encoding VCVTSD2SI with VEX.L=1 may encounter 
unpredictable behavior across different processor generations. 
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Operation 

VCVTSD2SI (EVEX encoded version) 

IF SRC *is register* AND (EVEX.b = 1) 

THEN 

SET_RM(EVEX.RC); 

ELSE 

SET_RM(MXCSR.RM); 

FI; 

IF 64-Bit Mode and OperandSIze = 64 

THEN DEST[63:0] <- Convert_Double_Precision_Floating_Point_To_lnteger(SRC[63:0]); 

ELSE DEST[31:0] <- Convert_Double_Precision_Floating_Point_To_lnteger(SRC[63:0]); 

FI 

(V)CVTSD2SI 

IF 64-Bit Mode and OperandSIze = 64 
THEN 

DEST[63:0] <-Convert_Double_Preclsion_Floatlng_Polnt_To_lnteger(SRC[63:0]); 

ELSE 

DEST[31:0] <-Convert_Double_Preclsion_Floatlng_Point_To_lnteger(SRC[63:0]); 

FI; 

Intel C/C++ Compiler Intrinsic Equivalent 

VCVTSD2SI int _mm_cvtsdJ32(_m128d); 

\/C\/TSD2SI int_mm_cvt_roundsd_i32(_ml 28d, Int r); 

VCVTSD2SI _int64 _mm_cvtsdJ64(_m128d); 

\/C\/TSD2SI_lnt64_mm_cvt_roundsd_i64(_ml 28d, Int r); 

CVTSD2SI _lnt64 _mm_cvtsd_sl64(_m128d); 

CVTSD2SI Int_mm_cvtsd_sl32(_m128d a) 

SIMD Floating-Point Exceptions 

Invalid, Precision 

Other Exceptions 

VEX-encoded instructions, see Exceptions Type 3; 

EVEX-encoded instructions, see Exceptions Type E3NF. 

#UD If VEX.vvvv != llllB or EVEX.vvvv != llllB. 
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CVTSDZSS—Convert Scalar Double-Precision Floating-Point Value to Scalar Single-Precision 
Floating-Point Value 


Opcode/ 

Instruction 

Op/ 

En 

64/32 
bit Mode 
Support 

CPUID 

Feature 

Flag 

Description 

F2 0F5A/r 

CVTSD2SS xmmi, xmm2/m64 

RM 

V/V 

SSE2 

Convert one double-precision floating-point value in 
xmm2/m64 to one single-precision floating-point value 
in xmmi. 

VEX.NDS.128.F2.0F.WIG 5A/r 
VCVTSD2SS xmm1,xmm2, 
xmm3/m64 

RVM 

v/v 

AVX 

Convert one double-precision floating-point value in 
xmm3/m64 to one single-precision floating-point value 
and merge with high bits in xmm2. 

EVEX.NDS.LIG.F2.0F.W1 5A /r 
VCVTSD2SS xmmi [k1}{z}, xmm2, 
xmm3/m64[er] 

T1S 

V/V 

AVX512F 

Convert one double-precision floating-point value in 
xmm3/m64 to one single-precision floating-point value 
and merge with high bits in xmm2 under writemask k1. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RM 

ModRM:reg (w) 

ModRM:r/m (r) 

NA 

NA 

RVM 

ModRM:reg (w) 

VEX.vvvv 

ModRM:r/m (r) 

NA 

T1S 

ModRM:reg (w) 

EVEX.vvvv 

ModRM:r/m (r) 

NA 


Description 

Converts a double-precision floating-point value in the "convert-from" source operand (the second operand in 
SSE2 version, otherwise the third operand) to a single-precision floating-point value in the destination operand. 

When the "convert-from" operand is an XMM register, the double-precision floating-point value is contained in the 
low quadword of the register. The result is stored in the low doubleword of the destination operand. When the 
conversion is inexact, the value returned is rounded according to the rounding control bits in the MXCSR register. 

128-bit Legacy SSE version: The "convert-from" source operand (the second operand) is an XMM register or 
memory location. Bits (MAX_VL-1:32) of the corresponding destination register remain unchanged. The destina¬ 
tion operand is an XMM register. 

VEX. 128 and EVEX encoded versions: The "convert-from" source operand (the third operand) can be an XMM 
register or a 64-bit memory location. The first source and destination operands are XMM registers. Bits (127:32) of 
the XMM register destination are copied from the corresponding bits in the first source operand. Bits (MAX_VL- 
1:128) of the destination register are zeroed. 

EVEX encoded version: the converted result in written to the low doubleword element of the destination under the 
writemask. 

Software should ensure VCVTSD2SS is encoded with VEX.L=0. Encoding VCVTSD2SS with VEX.L=1 may encounter 
unpredictable behavior across different processor generations. 
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Operation 

VCVTSD2SS (EVEX encoded version) 

IF (SRC2 *ls register*) AND (EVEX.b = 1) 

THEN 

SET_RM(EVEX.RC); 

ELSE 

SET_RM(MXCSR.RM); 

FI; 

IF k1 [0] or *no writemask* 

THEN DEST[31:0] <- Convert_Double_Preclslon_To_Single_Precision_Floatlng_Polnt(SRC2[63:0]); 
ELSE 

IF *merglng-masklng* ; merglng-masklng 

THEN *DEST[31:0] remains unchanged* 

ELSE ; zeroing-masking 

THEN DEST[31:0]^0 
FI; 

FI; 

DEST[127:32] ^SRCI [127:32] 

DEST[MAX_VL-1:128]^0 

VCVTSD2SS (VEX.128 encoded version) 

DEST[31:0] <-Convert_Double_Precision_To_Single_Precision_Floating_Point(SRC2[63:0]); 

DEST[127:32] ^SRCI [127:32] 

DEST[MAX_VL-1:128] ^0 

CVTSD2SS (128-bit Legacy SSE version) 

DEST[31:0] <-Convert_Double_Precision_To_Single_Precision_Floating_Point(SRC[63:0]); 

(* DEST[MAX_VL-1:32] Unmodified *) 

Intel C/C++ Compiler Intrinsic Equivalent 

VOJJSDZSS _m128_mm_mask_cvtsd_ss(_ml 28 s,_mmask8 k,_ml 28 a,_m128d b); 

MOJJSDZSS _ml 28 _mm_maskz_cvtsd_ss(_mmask8 k,_ml 28 a,_ml 28d b); 

\/C\/TSD2SS_ml 28_mm_cut_roundsd_ss(_ml 28 a,_ml 28d b, int r); 

\/C\/TSD2SS ml 28 _mm_mask_cvt_roundsd_ss( ml 28 s, mmask8 k, ml 28 a, ml 28d b, int r); 

\/C\/TSD2SS ml 28 _mm_maskz_cvt_roundsd_ss( mmask8 k, ml 28 a, ml 28d b, int r); 

CVTSD2SS _m128_mm_cvtsd_ss(_m128 a, _m128d b) 

SIMD Floating-Point Exceptions 

Overflow, Underflow, Invalid, Precision, Denormal 

Other Exceptions 

VEX-encoded instructions, see Exceptions Type 3. 

EVEX-encoded instructions, see Exceptions Type E3. 


3-256 Vol. 2A 


CVTSD2SS—Convert Scalar Double-Precision Floating-Point Value to Scalar Single-Precision Floating-Point Value 


INSTRUCTION SET REFERENCE, A-L 


CVTSIZSD—Convert Doubleword Integer to Scalar Double-Precision Floating-Point Value 


Opcode/ 

Instruction 

Op/ 

En 

64/32 
bit Mode 
Support 

CPUID 

Feature 

Flag 

Description 

F2 OF 2A/r 

CVTSI2SD xmm1,r32/m32 

RM 

V/V 

SSE2 

Convert one signed doubleword integer from 
r32/m32 to one double-precision floating-point 
value in xmmi. 

F2 REX.W OF 2A /r 

CVTSI2SD xmm1,r/m64 

RM 

V/N.E. 

SSE2 

Convert one signed quadword integer from r/m64 
to one double-precision floating-point value in 
xmmi. 

VEX.NDS.128.F2.0F.W0 2A /r 

VCVTSI2SD xmmi, xmm2, r/m32 

RVM 

V/V 

AVX 

Convert one signed doubleword integer from 
r/m32 to one double-precision floating-point 
value in xmmi. 

VEX.NDS.128.F2.0F.W1 2A/r 

VCVTSI2SD xmmi, xmm2, r/m64 

RVM 

V/N.E.' 

AVX 

Convert one signed quadword integer from r/m64 
to one double-precision floating-point value in 
xmmi. 

EVEX.NDS.LIG.F2.0F.W0 2A /r 

VCVTSI2SD xmmi, xmm2, r/m32 

T1S 

V/V 

AVX512F 

Convert one signed doubleword integer from 
r/m32 to one double-precision floating-point 
value in xmmi. 

EVEX.NDS.LIG.F2.0F.W1 2A /r 

VCVTSI2SD xmmi, xmm2, r/m64[er} 

T1S 

V/N.E.' 

AVX512F 

Convert one signed quadword integer from r/m64 
to one double-precision floating-point value in 
xmmi. 


NOTES: 

1. VEX.Wl/EVEX.Wl in non-64 bit is ignored; the instructions behaves as if the WO version is used. 


Instruction Operand 

Encoding 

Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RM 

ModRM:reg (w) 

ModRM:r/m (r) 

NA 

NA 

RVM 

ModRM:reg (w) 

VEX.vvvv 

ModRM:r/m (r) 

NA 

T1S 

ModRM:reg (w) 

EVEX.vvvv 

ModRM:r/m (r) 

NA 


Description 

Converts a signed doubleword integer (or signed quadword integer if operand size is 64 bits) in the "convert-from" 
source operand to a double-precision floating-point value in the destination operand. The result is stored in the low 
quadword of the destination operand, and the high quadword left unchanged. When conversion is inexact, the 
value returned is rounded according to the rounding control bits in the MXCSR register. 

The second source operand can be a general-purpose register or a 32/64-bit memory location. The first source and 
destination operands are XMM registers. 

128-bit Legacy SSE version: Use of the REX.W prefix promotes the instruction to 64-bit operands. The "convert- 
from" source operand (the second operand) is a general-purpose register or memory location. The destination is 
an XMM register Bits (MAX_VL-1:64) of the corresponding destination register remain unchanged. 

VEX. 128 and EVEX encoded versions: The "convert-from" source operand (the third operand) can be a general- 
purpose register or a memory location. The first source and destination operands are XMM registers. Bits (127:64) 
of the XMM register destination are copied from the corresponding bits in the first source operand. Bits (MAX_VL- 
1:128) of the destination register are zeroed. 

EVEX.WO version: attempt to encode this instruction with EVEX embedded rounding is ignored. 

VEX.Wl and EVEX.Wl versions: promotes the instruction to use 64-bit input value in 64-bit mode. 

Software should ensure VCVTSI2SD is encoded with VEX.L=0. Encoding VCVTSI2SD with VEX.L=1 may encounter 
unpredictable behavior across different processor generations. 
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Operation 

VCVTSIZSD (EVEX encoded version) 

IF (SRC2 *ls register*) AND (EVEX.b = 1) 

THEN 

SET_RM(EVEX.RC); 

ELSE 

SET_RM(MXCSR.RM); 

FI; 

IF 64-Bit Mode And OperandSIze = 64 
THEN 

DEST[63:0] <- Convert_lnteger_To_Double_Precision_Floating_Point(SRC2[63:0]); 

ELSE 

DEST[63:0] <- Convert_lnteger_To_Double_Precision_Floating_Point(SRC2[31:0]); 

FI; 

DEST[127:64] ^ SRC1 [127:64] 

DEST[MAX_VL-1:128]^0 

VCVTSIZSD (VEX.128 encoded version) 

IF 64-Bit Mode And OperandSIze = 64 
THEN 

DEST[63:0] <-Convert_lnteger_To_Double_Preclslon_Floatlng_Polnt(SRC2[63:0]); 

ELSE 

DEST[63:0] <-Convert_lnteger_To_Double_Preclslon_Floatlng_Polnt(SRC2[31:0]); 

FI; 

DEST[127:64] ^SRCI [127:64] 

DEST[MAX_VL-1:128] ^0 

CVTSI2SD 

IF 64-Bit Mode And OperandSIze = 64 
THEN 

DEST[63:0] <-Convert_lnteger_To_Double_Precision_Floatlng_Point(SRC[63:0]); 

ELSE 

DEST[63:0] <-Convert_lnteger_To_Double_Precision_Floatlng_Point(SRC[31:0]); 

FI; 

DEST[MAX_VL-1:64] (Unmodified) 

Intel C/C++ Compiler Intrinsic Equivalent 

VCVTSIZSD _m128d _mm_cvti32_sd(_m128d s, Int a); 

VCVTSIZSD_ml 28d _mm_cvt_roundl32_sd(_ml 28d s, Int a, Int r); 

VCVTSIZSD _m128d _mm_cvti64_sd(_m128d s, _lnt64 a); 

VCVTSIZSD_ml 28d _mm_cvt_roundl64_sd(_ml 28d s,_Int64 a, int r); 

CVTSI2SD _m128d _mm_cvtsi64_sd(_m128d s, _int64 a); 

CVTSI2SD _m128d_mm_cvtsi32_sd(_m128d a, int b) 

SIMD Floating-Point Exceptions 

Precision 

Other Exceptions 

VEX-encoded instructions, see Exceptions Type 3 if Wl, else Type 5. 
EVEX-encoded instructions, see Exceptions Type E3NF if Wl, else Type ElONF. 
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CVTSIZSS—Convert Doubleword Integ 

er to Scalar Single-Precision Floating-Point Value 

Opcode/ 

Instruction 

Op/ 

En 

64/32 
bit Mode 
Support 

CPUID 

Feature 

Flag 

Description 

F3 OF 2A/r 

CVTSI2SS xmm1,r/m32 

RM 

V/V 

SSE 

Convert one signed doubleword integer from r/m32 
to one single-precision floating-point value in xmmi. 

F3 REX.W OF 2A /r 

CVTSI2SS xmm1,r/m64 

RM 

V/N.E. 

SSE 

Convert one signed quadword integer from r/m64 
to one single-precision floating-point value in xmmi. 

VEX.NDS.128.F3.0F.W0 2A /r 

VCVTSI2SS xmmi, xmm2, r/m32 

RVM 

V/V 

AVX 

Convert one signed doubleword integer from r/m32 
to one single-precision floating-point value in xmmi. 

VEX.NDS.128.F3.0F.W1 2A/r 

VCVTSI2SS xmmi, xmm2, r/m64 

RVM 

V/N.E.' 

AVX 

Convert one signed quadword integer from r/m64 
to one single-precision floating-point value in xmmi. 

EVEX.NDS.LIG.F3.0F.W0 2A /r 

VCVTSI2SS xmmi, xmm2, r/m32[er] 

T1S 

V/V 

AVX512F 

Convert one signed doubleword integer from r/m32 
to one single-precision floating-point value in xmmi. 

EVEX.NDS.LIG.F3.0F.W1 2A /r 

VCVTSI2SS xmmi, xmm2, r/m64[er] 

T1S 

V/N.E.' 

AVX512F 

Convert one signed quadword integer from r/m64 
to one single-precision floating-point value in xmmi. 


NOTES: 

1. VEX.Wl/EVEX.Wl in non-64 bit is ignored; the instructions behaves as if the WO version is used. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RM 

ModRM:reg (w) 

ModRM:r/m (r) 

NA 

NA 

RVM 

ModRM:reg (w) 

VEX.vvvv 

ModRM:r/m (r) 

NA 

T1S 

ModRM:reg (w) 

EVEX.vvvv 

ModRM:r/m (r) 

NA 


Description 

Converts a signed doubleword integer (or signed quadword integer if operand size is 64 bits) in the "convert-from" 
source operand to a single-precision floating-point value in the destination operand (first operand). The "convert- 
from" source operand can be a general-purpose register or a memory location. The destination operand is an XMM 
register. The result is stored in the low doubleword of the destination operand, and the upper three doublewords 
are left unchanged. When a conversion is inexact, the value returned is rounded according to the rounding control 
bits in the MXCSR register or the embedded rounding control bits. 

128-bit Legacy SSE version: In 64-bit mode. Use of the REX.W prefix promotes the instruction to use 64-bit input 
value. The "convert-from" source operand (the second operand) is a general-purpose register or memory location. 
Bits (MAX_VL-1:32) of the corresponding destination register remain unchanged. 

VEX. 128 and EVEX encoded versions: The "convert-from" source operand (the third operand) can be a general- 
purpose register or a memory location. The first source and destination operands are XMM registers. Bits (127:32) 
of the XMM register destination are copied from corresponding bits in the first source operand. Bits (MAX_VL- 
1:128) of the destination register are zeroed. 

EVEX encoded version: the converted result in written to the low doubleword element of the destination under the 
writemask. 

Software should ensure VCVTSI2SS is encoded with VEX.L=0. Encoding VCVTSI2SS with VEX.L=1 may encounter 
unpredictable behavior across different processor generations. 
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Operation 

VCVTSIZSS (EVEX encoded version) 

IF (SRC2 *ls register*) AND (EVEX.b = 1) 

THEN 

SET_RM(EVEX.RC); 

ELSE 

SET_RM(MXCSR.RM); 

FI; 

IF 64-Bit Mode And OperandSIze = 64 
THEN 

DEST[31:0] <- Convert_lnteger_To_Single_Precision_Floating_Point(SRC[63:0]); 
ELSE 

DEST[31:0] <- Convert_lnteger_To_Single_Precision_Floating_Point(SRC[31:0]); 
FI; 

DEST[127:32] ^SRCI [127:32] 

DEST[MAX_VL-1:128]^0 

VCVTSIZSS (VEX.1 Z8 encoded version) 

IF 64-Bit Mode And OperandSIze = 64 
THEN 

DEST[31:0] <-Convert_lnteger_To_Single_Precislon_Floatlng_Polnt(SRC[63:0]); 
ELSE 

DEST[31:0] <-Convert_lnteger_To_Single_Precislon_Floatlng_Polnt(SRC[31:0]); 
FI; 

DEST[127:32] ^SRCI [127:32] 

DEST[MAX_VL-1:128] ^0 

CVTSI2SS (128-bit Legacy SSE version) 

IF 64-Bit Mode And OperandSIze = 64 
THEN 

DEST[31:0] <-Convert_lnteger_To_Single_Precislon_Floatlng_Polnt(SRC[63:0]); 
ELSE 

DEST[31:0] <-Convert_lnteger_To_Single_Precislon_Floatlng_Polnt(SRC[31:0]); 
FI; 

DEST[MAX_VL-1:32] (Unmodified) 

Intel C/C++ Compiler Intrinsic Equivalent 

VCVTSIZSS _m128 _mm_cvti32_ss(_m128 s, Int a); 

VCVTSIZSS_ml 28 _mm_cvt_roundl32_ss(_ml 28 s, Int a, Int r); 

VCVTSIZSS _m128 _mm_cvtl64_ss(_m128 s, _lnt64 a); 

VCVTSIZSS_ml 28 _mm_cvt_roundl64_ss(_ml 28 s,_Int64 a, Int r); 

CVTSI2SS_ml 28 _mm_cvtsl64_ss(_ml 28 s,_Int64 a); 

CVTSI2SS _m128 _mm_cvtsi32_ss(_m128 a, Int b); 

SIMD Floating-Point Exceptions 

Precision 

Other Exceptions 

VEX-encoded instructions, see Exceptions Type 3. 

EVEX-encoded instructions, see Exceptions Type E3NF. 
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CVTSSZSD—Convert Scalar Single-Precision Floating-Point Value to Scalar Double-Precision 
Floating-Point Value 


Opcode/ 

Instruction 

Op/ 

En 

64/32 
bit Mode 
Support 

CPUID 

Feature 

Flag 

Description 

F3 OF 5A/r 

CVTSS2SD xmmi, xmm2/m32 

RM 

V/V 

SSE2 

Convert one single-precision floating-point value in 
xmm2/m32 to one double-precision floating-point value 
in xmmi. 

VEX.NDS.128.F3.0F.WIG 5A/r 
VCVTSS2SD xmm1,xmm2, 
xmm3/m32 

RVM 

v/v 

AVX 

Convert one single-precision floating-point value in 
xmm3/m32 to one double-precision floating-point value 
and merge with high bits of xmm2. 

EVEX.NDS.LIG.F3.0F.W0 5A /r 
VCVTSS2SD xmmi {k1]{z}, xmm2, 
xmm3/m32[sae} 

T1S 

V/V 

AVX512F 

Convert one single-precision floating-point value in 
xmm3/m32 to one double-precision floating-point value 
and merge with high bits of xmm2 under writemask k1. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RM 

ModRM:reg (w) 

ModRM:r/m (r) 

NA 

NA 

RVM 

ModRM:reg (w) 

VEX.vvvv 

ModRM:r/m (r) 

NA 

T1S 

ModRM:reg (w) 

EVEX.vvvv 

ModRM:r/m (r) 

NA 


Description 

Converts a single-precision floating-point value in the "convert-from" source operand to a double-precision 
floating-point value in the destination operand. When the "convert-from" source operand is an XMM register, the 
single-precision floating-point value is contained in the low doubleword of the register. The result is stored in the 
low quadword of the destination operand. 

128-bit Legacy SSE version: The "convert-from" source operand (the second operand) is an XMM register or 
memory location. Bits (MAX_VL-1:64) of the corresponding destination register remain unchanged. The destina¬ 
tion operand is an XMM register. 

VEX. 128 and EVEX encoded versions: The "convert-from" source operand (the third operand) can be an XMM 
register or a 32-bit memory location. The first source and destination operands are XMM registers. Bits (127:64) of 
the XMM register destination are copied from the corresponding bits in the first source operand. Bits (MAX_VL- 
1:128) of the destination register are zeroed. 

Software should ensure VCVTSS2SD is encoded with VEX.L=0. Encoding VCVTSS2SD with VEX.L=1 may encounter 
unpredictable behavior across different processor generations. 

Operation 

VCVTSS2SD (EVEX encoded version) 

IF k1 [0] or *no writemask* 

THEN DEST[63:0] <- Convert_Single_Precislon_To_Double_Preclslon_Floatlng_Polnt(SRC2[31:0]); 

ELSE 

IF *merglng-masking* ; merging-masking 

THEN *DEST[63:0] remains unchanged* 

ELSE ; zeroing-masking 

THEN DEST[63:0] = 0 
FI; 

FI; 

DEST[127:64] ^ SRC1 [127:64] 

DEST[MAX_VL-1:128]^0 
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VCVTSSZSD (VEX.128 encoded version) 

DEST[63:0] <-Convert_Slngle_Preclslon_To_Double_Precislon_Floating_Polnt(SRC2[31:0]) 

DEST[127:64] ^SRCI [127:64] 

DEST[MAX_VL-1:128] ^0 

CVTSS2SD (128-bit Legacy SSE version) 

DEST[63:0] <-Convert_Single_Precision_To_Double_Precision_Floating_Point(SRC[31:0]); 

DEST[MAX_VL-1:64] (Unmodified) 

Intel C/C++ Compiler Intrinsic Equivalent 

VCVTSSZSD_ml 28d _mm_cvt_roundss_sd(_ml 28d a,_ml 28 b, Int r); 

VCVTSSZSD ml 28d _mm_mask_cvt_roundss_sd( ml 28d s, mmask8 m, ml 28d a, ml 28 b, int r); 

VCVTSSZSD ml 28d _mm_maskz_cvt_roundss_sd( mmask8 k, ml 28d a, ml 28 a, int r); 

VCVTSSZSD_ml 28d _mm_mask_cvtss_sd(_ml 28d s,_mmask8 m,_ml 28d a,_ml 28 b); 

VCVTSSZSD_ml 28d _mm_maskz_cvtss_sd(_mmask8 m,_ml 28d a,_ml 28 b); 

CVTSS2SD _m128d_mm_cvtss_sd(_m128d a, _m128 a); 

SIMD Floating-Point Exceptions 

Invalid, Denormal 

Other Exceptions 

VEX-encoded instructions, see Exceptions Type 3. 

EVEX-encoded instructions, see Exceptions Type E3. 
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CVTSSZSI—Convert Scalar Single-Precision Floating-Point Value to Doubleword Integer 


Opcode/ 

Instruction 

Op/ 

En 

64/32 
bit Mode 
Support 

CPUID 

Feature 

Flag 

Description 

F3 OF 2D /r 

CVTSS2SI r32, xmm1/m32 

RM 

V/V 

SSE 

Convert one single-precision floating-point value from 
xmmi /m32 to one signed doubleword integer in r32. 

F3 REX.W OF 2D /r 

CVTSS2SI r64, xmm1/m32 

RM 

V/N.E. 

SSE 

Convert one single-precision floating-point value from 
xmm1/m32 to one signed quadword integer in r64. 

VEX.128.F3.0F.W0 2D/r 

VCVTSS2SI r32, xmm1/m32 

RM 

V/V 

AVX 

Convert one single-precision floating-point value from 
xmmi /m32 to one signed doubleword integer in r32. 

VEX.128.F3.0F.W1 2D/r 

VCVTSS2SI r64, xmm1/m32 

RM 

V/N.E.' 

AVX 

Convert one single-precision floating-point value from 
xmmi /m32 to one signed quadword integer in r64. 

EVEX.LIG.F3.0F.W0 2D /r 

VCVTSS2SI r32, xmmi /m32{er} 

T1F 

V/V 

AVX512F 

Convert one single-precision floating-point value from 
xmmi /m32 to one signed doubleword integer in r32. 

EVEX.LIG.F3.0F.W1 2D /r 

VCVTSS2SI r64, xmmi /m32{er} 

T1F 

V/N.E.' 

AVX512F 

Convert one single-precision floating-point value from 
xmmi /m32 to one signed quadword integer in r64. 


NOTES: 

1. VEX.Wl/EVEX.Wl in non-64 bit is ignored; the instructions behaves as if the WO version is used. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RM 

ModRM:reg (w) 

ModRM:r/m (r) 

NA 

NA 

T1F 

ModRM:reg (w) 

ModRM:r/m (r) 

NA 

NA 


Description 

Converts a single-precision floating-point value in the source operand (the second operand) to a signed double- 
word integer (or signed quadword integer if operand size is 64 bits) in the destination operand (the first operand). 
The source operand can be an XMM register or a memory location. The destination operand is a general-purpose 
register. When the source operand is an XMM register, the single-precision floating-point value is contained in the 
low doubleword of the register. 

When a conversion is inexact, the value returned is rounded according to the rounding control bits in the MXCSR 
register or the embedded rounding control bits. If a converted result cannot be represented in the destination 
format, the floating-point invalid exception is raised, and if this exception is masked, the indefinite integer value 
(Z”"^, where w represents the number of bits in the destination format) is returned. 

Legacy SSE instructions: In 64-bit mode. Use of the REX.W prefix promotes the instruction to produce 64-bit data. 
See the summary chart at the beginning of this section for encoding data and limits. 

VEX.Wl and EVEX.Wl versions: promotes the instruction to produce 64-bit data in 64-bit mode. 

Note: VEX.vvvv and EVEX.vvvv are reserved and must be 1111b, otherwise instructions will #UD. 

Software should ensure VCVTSS2SI is encoded with VEX.L=0. Encoding VCVTSS2SI with VEX.L=1 may encounter 
unpredictable behavior across different processor generations. 
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Operation 

VCVTSSZSI (EVEX encoded version) 

IF (SRC *ls register*) AND (EVEX.b = 1) 

THEN 

SET_RM(EVEX.RC); 

ELSE 

SET_RM(MXCSR.RM); 

FI; 

IF 64-blt Mode and OperandSIze = 64 
THEN 

DEST[63:0] <- Convert_Slngle_Precision_Floatlng_Polnt_To_lnteger(SRC[31:0]); 
ELSE 

DEST[31:0] <- Convert_Slngle_Precision_Floatlng_Polnt_To_lnteger(SRC[31:0]); 
FI; 

(\/)C\/TSSZSI (Legacy and VEX.I Z8 encoded version) 

IF 64-blt Mode and OperandSIze = 64 
THEN 

DEST[63:0] <-Convert_Single_Precision_Floatlng_Polnt_To_lnteger(SRC[31:0]); 
ELSE 

DEST[31:0] <-Convert_Single_Precision_Floatlng_Polnt_To_lnteger(SRC[31:0]); 
FI; 

Intel C/C++ Compiler Intrinsic Equivalent 

VCVTSSZSI int _mm_cvtssJ3Z( _m1 Z8 a); 

VCVTSSZSI int_mm_cvt_roundss_l3Z(_ml Z8 a, Int r); 

VCVTSSZSI _int64 _mm_cvtssJ64( _m1 Z8 a); 

VCVTSSZSI_Int64 _mm_cvt_roundss_l64(_ml Z8 a, int r); 

SIMD Floating-Point Exceptions 

Invalid, Precision 

Other Exceptions 

VEX-encoded instructions, see Exceptions Type 3; additionally 
#UD If VEX.vvvv != llllB. 

EVEX-encoded instructions, see Exceptions Type E3NF. 
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CVTTPDZDQ—Convert with Truncation Packed Double-Precision Floating-Point Values to 
Packed Doubleword Integers 


Opcode/ 

Instruction 

Op/ 

En 

64/32 
bit Mode 
Support 

CPUID 

Feature 

Flag 

Description 

66 OF E6 /r 

CVTTPD2DQ xmmi, xmm2/m128 

RM 

V/V 

SSE2 

Convert two packed double-precision floating-point 
values in xmm2/mem to two signed doubleword 
integers in xmmi using truncation. 

VEX.128.66.0F.WIG E6 /r 

VCVTTPD2DQ xmmi, xmmZ/ml 28 

RM 

v/v 

AVX 

Convert two packed double-precision floating-point 
values in xmm2/mem to two signed doubleword 
integers in xmmi using truncation. 

VEX.256.66.0F.WIG E6 /r 

VCVTTPD2DQ xmmi, ymm2/m256 

RM 

V/V 

AVX 

Convert four packed double-precision floating-point 
values in ymm2/mem to four signed doubleword 
integers in xmmi using truncation. 

EVEX.128.66.0F.W1 E6/r 
VCVTTPD2DQxmm1 {k1}{z}, 
xmm2/m128/m64bcst 

FV 

v/v 

AVX512VL 

AVX512F 

Convert two packed double-precision floating-point 
values in xmm2/m128/m64bcst to two signed 
doubleword integers in xmmi using truncation subject 
to writemask kl. 

EVEX.256.66.0F.W1 E6 /r 
VCVTTPD2DQxmm1 {k1}{z}, 
ymm2/m256/m64bcst 

FV 

v/v 

AVX512VL 

AVX512F 

Convert four packed double-precision floating-point 
values in ymm2/m256/m64bcst to four signed 
doubleword integers in xmmi using truncation subject 
to writemask kl. 

EVEX.512.66.0F.W1 E6/r 
VCVTTPD2DQymm1 [k1}{z}, 
zmm2/m512/m64bcst(sae} 

FV 

v/v 

AVX512F 

Convert eight packed double-precision floating-point 
values in zmm2/m512/m64bcst to eight signed 
doubleword integers in ymmi using truncation subject 
to writemask kl. 


Instruction Operand 

Encoding 

Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RM 

ModRM:reg (w) 

ModRM:r/m (r) 

NA 

NA 

FV 

ModRM:reg (w) 

ModRM:r/m (r) 

NA 

NA 


Description 

Converts two, four or eight packed double-precision floating-point values in the source operand (second operand) 
to two, four or eight packed signed doubleword integers in the destination operand (first operand). 

When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result is larger than 
the maximum signed doubleword integer, the floating-point invalid exception is raised, and if this exception is 
masked, the indefinite integer value (80000000H) is returned. 

EVEX encoded versions: The source operand is a ZMM/YMM/XMM register, a 512/256/128-bit memory location, or 
a 512/256/128-bit vector broadcasted from a 64-bit memory location. The destination operand is a 
YMM/XMM/XMM (low 64 bits) register conditionally updated with writemask kl. The upper bits (MAX_VL-1:256) of 
the corresponding destination are zeroed. 

VEX.256 encoded version: The source operand is a YMM register or 256- bit memory location. The destination 
operand is an XMM register. The upper bits (MAX_VL-1:128) of the corresponding ZMM register destination are 
zeroed. 

VEX.128 encoded version: The source operand is an XMM register or 128- bit memory location. The destination 
operand is a XMM register. The upper bits (MAX_VL-1:64) of the corresponding ZMM register destination are 
zeroed. 

128-bit Legacy SSE version: The source operand is an XMM register or 128- bit memory location. The destination 
operand is an XMM register. The upper bits (MAX_VL-1:128) of the corresponding ZMM register destination are 
unmodified. 

Note: VEX.vvvv and EVEX.vvvv are reserved and must be 1111b, otherwise instructions will #UD. 
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Figure 3-15. VCVTTPD2DQ (VEX.256 encoded version) 


Operation 

VCVTTPDZDQ (EVEX encoded versions) when src operand is a register 

(KL, VL) = (2,128), (4, 256), (8, 512) 

FOR] ^0 TO KL-1 
i^j*32 
k ^ j * 64 

IF k1 [j] OR *no writemask* 

THEN DEST[l+31:i] ^ 

Convert_Double_Precision_Floatlng_Polnt_To_lnteger_Truncate(SRC[k+63:k]) 

ELSE 

IF *merglng-masklng* ; merglng-masklng 

THEN *DEST[I+31 :l] remains unchanged* 

ELSE ; zeroing-masking 

DEST[i+31:i]^0 
FI 
FI; 

ENDFOR 

DEST[MAX_VL-1:VL/2]^0 
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VCVTTPDZDQ (EVEX encoded versions) when src operand is a memory source 

(KL, VL) = (2,128), (4, 256), (8, 512) 

FOR) ^0 TO KL-1 
i^j*32 
k ^ j * 64 

IF k10] OR *no wrltemask* 

THEN 

IF(EVEX.b= 1) 

THEN 

DEST[I+31:I] ^ 

Convert_Double_Preclslon_Floatlng_Polnt_To_lnteger_Truncate(SRC[63:0]) 

ELSE 

DEST[I+31:I] ^ 

Convert_Double_Preclsion_Floatlng_Polnt_To_lnteger_Truncate(SRC[k+63:k]) 

FI; 

ELSE 

IF *merglng-masking* ; merging-masking 

THEN *DEST[i+31:i] remains unchanged* 

ELSE ; zeroing-masking 

DEST[i+31:i]^0 
FI 
FI; 

ENDFOR 

DEST[MAX_VL-1:VL/2]^0 

VCVTTPDZDQ (VEX.256 encoded version) 

DEST[31:0] <-Convert_Double_Precision_Floating_Point_To_lnteger_Truncate(SRC[63:0]) 
DEST[63:32] <-Convert_Double_Precision_Floating_Point_To_lnteger_Truncate(SRC[127:64]) 
DEST[95:64] <-Convert_Double_Precision_Floating_Point_To_lnteger_Truncate(SRC[191:128]) 
DEST[127:96] <-Convert_Double_Precision_Floating_Point_To_lnteger_Truncate(SRC[255:192) 
DEST[MAX_VL-1:128]^0 

VCVTTPDZDQ (VEX.128 encoded version) 

DEST[31:0] <-Convert_Double_Precision_Floating_Point_To_lnteger_Truncate(SRC[63:0]) 
DEST[63:32] <-Convert_Double_Precision_Floating_Point_To_lnteger_Truncate(SRC[127:64]) 
DEST[MAX_VL-1:64]^0 

CVTTPD2DQ (128-bit Legacy SSE version) 

DEST[31:0] <-Convert_Double_Precision_Floating_Point_To_lnteger_Truncate(SRC[63:0]) 
DEST[63:32] <-Convert_Double_Precision_Floating_Point_To_lnteger_Truncate(SRC[127:64]) 
DEST[127:64] ^0 
DEST[MAX_VL-1:128] (unmodified) 
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Intel C/C++ Compiler Intrinsic Equivalent 

VCVTTPD2DQ_m256l _mm512_cvttpd_epl32( _m512d a); 

\/C\/TTPD2DQ_m256l_mm512_masl<_cvttpd_epl32(_m256i s,_mmaskS k,_mSI 2d a); 

\/C\/TTPD2DQ_m256l _mm512_maskz_cvttpd_epi32(_mmaskS k,_mSI 2d a); 

\/C\/TTPD2DQ_m256l _mm512_cvtt_roundpd_epl32(_mSI 2d a, Int sae); 

\/C\/TTPD2DQ_m256l _mm512_mask_cvtt_roundpd_epi32(_m256l s,_mmaskS k,_m512d a, int sae); 

\/C\/TTPD2DQ_m256l _mm512_maskz_cvtt_roundpd_epl32(_mmaskS k,_mSI 2d a, Int sae); 

\/C\/TTPD2DQ_ml 2SI _mm256_mask_cvttpd_epl32(_ml 2Si s,_mmaskS k,_m256d a); 

\/C\/TTPD2DQ_ml 2SI _mm256_maskz_cvttpd_epi32(_mmaskS k,_m256d a); 

\/C\/TTPD2DQ_ml 2SI _mm_mask_cvttpd_epi32(_ml 2SI s,_mmaskS k,_ml 2Sd a); 

\/C\/TTPD2DQ_ml 2SI _mm_maskz_cvttpd_epi32(_mmaskS k,_ml 2Sd a); 

VCVTTPD2DQ_m12SI _mm256_cvttpd_epi32 (_m256d src); 

C\/TTPD2DQ_ml 2Si _mm_cvttpd_epl32 (_ml 2Sd src); 

SIMD Floating-Point Exceptions 

Invalid, Precision 

Other Exceptions 

VEX-encoded instructions, see Exceptions Type 2; 

EVEX-encoded instructions, see Exceptions Type E2. 

#UD If VEX.vvvv != llllB or EVEX.vvvv != llllB. 
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CVTTPDZPI—Convert with Truncation Packed Double-Precision FP Values to Packed Dword 
Integers 


Opcode/ 

Instruction 

Op/ 

En 

64-Bit 

Mode 

Compat/ 
Leg Mode 

Description 

66 0F2C/r 

CVTTPD2PI mm, xmm/m'128 

RM 

Valid 

Valid 

Convert two packer double-precision floating¬ 
point values from xmm/ml28 to two packed 
signed doubleword integers in mm using 
truncation. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RM 

ModRM:reg (w) 

ModRM:r/m (r) 

NA 

NA 


Description 

Converts two packed double-precision floating-point values in the source operand (second operand) to two packed 
signed doubleword integers in the destination operand (first operand). The source operand can be an XMM register 
or a 128-bit memory location. The destination operand is an MMX technology register. 

When a conversion is inexact, a truncated (round toward zero) result is returned. If a converted result is larger 
than the maximum signed doubleword integer, the floating-point invalid exception is raised, and if this exception is 
masked, the indefinite integer value (80000000H) is returned. 

This instruction causes a transition from x87 FPU to MMX technology operation (that is, the x87 FPU top-of-stack 
pointer is set to 0 and the x87 FPU tag word is set to all Os [valid]). If this instruction is executed while an x87 FPU 
floating-point exception is pending, the exception is handled before the C\/TTPD2PI instruction is executed. 

In 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15). 

Operation 

DEST[31:0] ^ Convert_Double_Precision_Floating_Point_To_lnteger32_Truncate(SRC[63:0]); 

DEST[63:32] <- Convert_Double_Precision_Floating_Point_To_lnteger32_ 

Truncate(SRC[127:64]); 

Intel C/C++ Compiler Intrinsic Equivalent 

CVTTPD1 PI: _m64 _mm_cvttpd_pi32(_m128d a) 

SIMD Floating-Point Exceptions 

Invalid, Precision 

Other Mode Exceptions 

See Table 22-4, "Exception Conditions for Legacy SIMD/MMX Instructions with FP Exception and 16-Byte Align¬ 
ment," in the Intel® 64 and IA-32 Architectures Software Developer's Manual, Volume 3B. 
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INSTRUCTION SET REFERENCE, A-L 


CVTTPSZDQ—Convert with Truncation Packed Single-Precision Floating-Point Values to Packed 
Signed Doubleword Integer Values 


Opcode/ 

Instruction 

Op/ 

En 

64/32 
bit Mode 
Support 

CPUID 

Feature 

Flag 

Description 

F3 OF 5B /r 

CVTTPS2DQ xmmi, xmmZ/ml 28 

RM 

V/V 

SSE2 

Convert four packed single-precision floating-point 
values from xmm2/mem to four packed signed 
doubleword values in xmmi using truncation. 

VEX.128.F3.0F.WIG5B/r 

VCVTTPS2DQ xmmi, xmm2/m12B 

RM 

v/v 

AVX 

Convert four packed single-precision floating-point 
values from xmm2/mem to four packed signed 
doubleword values in xmmi using truncation. 

VEX.256.F3.0F.WIG 5B /r 

VCVTTPS2DQ ymmi, ymm2/m256 

RM 

V/V 

AVX 

Convert eight packed single-precision floating-point 
values from ymm2/mem to eight packed signed 
doubleword values in ymmi using truncation. 

EVEX.12B.F3.0F.W0 5B/r 
VCVTTPS2DQxmm1 {k1}{z}, 
xmm2/m128/m32bcst 

FV 

v/v 

AVX512VL 

AVX512F 

Convert four packed single precision floating-point 
values from xmm2/m128/m32bcst to four packed 
signed doubleword values in xmmi using truncation 
subject to writemask kl. 

EVEX.256.F3.0F.W0 5B /r 
VCVTTPS2DQymm1 [k1}[z}, 
ymm2/m256/m32bcst 

FV 

v/v 

AVX512VL 

AVX512F 

Convert eight packed single precision floating-point 
values from ymm2/m256/m32bcst to eight packed 
signed doubleword values in ymmi using truncation 
subject to writemask kl. 

EVEX.512.F3.0F.W0 5B/r 
VCVTTPS2DQzmm1 [k1}{z}, 
zmm2/m512/m32bcst {sae} 

FV 

v/v 

AVX512F 

Convert sixteen packed single-precision floating-point 
values from zmm2/m512/m32bcst to sixteen packed 
signed doubleword values in zmmi using truncation 
subject to writemask kl. 


Instruction Operand 

Encoding 

Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RM 

ModRM:reg (w) 

ModRM:r/m (r) 

NA 

NA 

FV 

ModRM:reg (w) 

ModRM:r/m (r) 

NA 

NA 


Description 

Converts four, eight or sixteen packed single-precision floating-point values in the source operand to four, eight or 
sixteen signed doubleword integers in the destination operand. 

When a conversion is inexact, a truncated (round toward zero) value is returned. If a converted result is larger than 
the maximum signed doubleword integer, the floating-point invalid exception is raised, and if this exception is 
masked, the indefinite integer value (80000000H) is returned. 

EVEX encoded versions: The source operand is a ZMM/YMM/XMM register, a 512/256/128-bit memory location or 
a 512/256/128-bit vector broadcasted from a 32-bit memory location. The destination operand is a 
ZMM/YMM/XMM register conditionally updated with writemask kl. 

VEX.256 encoded version: The source operand is a YMM register or 256- bit memory location. The destination 
operand is a YMM register. The upper bits (MAX_VL-1:256) of the corresponding ZMM register destination are 
zeroed. 

VEX.128 encoded version: The source operand is an XMM register or 128- bit memory location. The destination 
operand is a XMM register. The upper bits (MAX_VL-1:128) of the corresponding ZMM register destination are 
zeroed. 

128-bit Legacy SSE version: The source operand is an XMM register or 128- bit memory location. The destination 
operand is an XMM register. The upper bits (MAX_VL-1:128) of the corresponding ZMM register destination are 
unmodified. 

Note: VEX.vvvv and EVEX.vvvv are reserved and must be 1111b otherwise instructions will #UD. 
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Operation 

VCVTTPSZDQ (EVEX encoded versions) when src operand is a register 

(KL, VL) = (4,1 28), (8, 256), (16, 512) 

FOR) ^0 TO KL-1 
I ^j*32 

IF k10] OR *no writemask* 

THEN DEST[I+31:I] ^ 

Convert_Slngle_Preclsion_Floatlng_Point_To_lnteger_Truncate(SRC[l+31:!]) 

ELSE 

IF *merglng-masking* ; merging-masking 

THEN *DEST[i+31:i] remains unchanged* 

ELSE ; zeroing-masking 

DEST[i+31:i]^0 
FI 
FI; 

ENDFOR 

DEST[MAX_VL-1:VL]^0 

VCVTTPSZDQ (EVEX encoded versions) when src operand is a memory source 

(KL, VL) = (4,1 28), (8, 256), (16, 512) 

FOR) ^0 TO 15 
I ^j*32 

IF k10] OR *no writemask* 

THEN 

IF(EVEX.b= 1) 

THEN 

DEST[i+31:i] ^ 

Convert_Single_Precision_Floating_Point_To_lnteger_Truncate(SRC[31:0]) 

ELSE 

DEST[i+31:i] ^ 

Convert_Single_Precision_Floating_Point_To_lnteger_Truncate(SRC[i+31 :i]) 

FI; 

ELSE 

IF *merging-masking* ; merging-masking 

THEN *DEST[i+31:i] remains unchanged* 

ELSE ; zeroing-masking 

DEST[i+31:i]^0 
FI 
FI; 

ENDFOR 

DEST[MAX_VL-1:VL]^0 

VCVTTPSZDQ {VEX.256 encoded version) 

DEST[31:0] <-Convert_Single_Precision_Floating_Point_To_lnteger_Truncate(SRC[31:0]) 
DEST[63:32] <-Convert_Single_Precision_Floating_Point_To_lnteger_Truncate(SRC[63:32]) 
DEST[95:64] <-Convert_Single_Precision_Floating_Point_To_lnteger_Truncate(SRC[95:64]) 
DEST[127:96] <-Convert_Single_Precision_Floating_Point_To_lnteger_Truncate(SRC[127:96) 
DEST[159:128] <-Convert_Single_Precision_Floating_Point_To_lnteger_Truncate(SRC[159:128]) 
DEST[191:160] <-Convert_Single_Precision_Floating_Point_To_lnteger_Truncate(SRC[191:160]) 
DEST[223:192] <-Convert_Single_Precision_Floating_Point_To_lnteger_Truncate(SRC[223:192]) 
DEST[255:224] <-Convert_Single_Precision_Floating_Point_To_lnteger_Truncate(SRC[255:224]) 
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VCVTTPS2DQ (UEX.128 encoded version) 

DEST[31:0] <-Convert_Slngle_Preclslon_Floatlng_Polnt_To_lnteger_Truncate(SRC[31:0]) 

DEST[63:32] <-Convert_Slngle_Preclslon_Floatlng_Point_To_lnteger_Truncate(SRC[63:32]) 

DEST[95:64] <-Convert_Slngle_Preclslon_Floatlng_Point_To_lnteger_Truncate(SRC[95:64]) 

DEST[127:96] <-Convert_Single_Precision_Floatlng_Polnt_To_lnteger_Truncate(SRC[127:96]) 
DEST[MAX_VL-1:128] ^0 

CVTTPS2DQ (128-bit Legacy SSE version) 

DEST[31:0] <-Convert_Single_Precision_Floating_Point_To_lnteger_Truncate(SRC[31:0]) 

DEST[63:32] <-Convert_Single_Precision_Floating_Point_To_lnteger_Truncate(SRC[63:32]) 

DEST[95:64] <-Convert_Single_Precision_Floating_Point_To_lnteger_Truncate(SRC[95:64]) 

DEST[127:96] <-Convert_Single_Precision_Floating_Point_To_lnteger_Truncate(SRC[127:96]) 
DEST[MAX_VL-1:128] (unmodified) 

Intel C/C++ Compiler Intrinsic Equivalent 

VCVTTPS2DQ_m512i _mm512_cvttps_epi32( _m512 a); 

\/C\/TTPS2DQ mSI 2i _mm512_mask_cvttps_epi32( mSI 2i s, mmaski 6 k, mSI 2 a); 

\/C\/TTPS2DQ_m512i_mm512_maskz_cvttps_epi32(_mmaski 6 k,_m512 a); 

\/C\/TTPS2DQ_mSI 21 _mm512_cvtt_roundps_epi32(_mSI 2 a, int sae); 

\/C\/TTPS2DQ_m512i_mm512_mask_cvtt_roundps_epi32(_m512i s,_mmaski 6 k,_m512 a, int sae); 

\/C\/TTPS2DQ_m512i_mm512_maskz_cvtt_roundps_epi32(_mmaski 6 k,_m512 a, int sae); 

\/C\/TTPS2DQ m256i _mm256_mask_cvttps_epi32( m256i s, mmaskS k, m256 a); 

\/C\/TTPS2DQ_m256i _mm256_maskz_cvttps_epi32(_mmaskS k,_m256 a); 

\/C\/TTPS2DQ_ml 28i _mm_mask_cvttps_epi32(_ml 28i s,_mmask8 k,_ml 28 a); 

\/C\/TTPS2DQ_ml 28i _mm_maskz_cvttps_epi32(_mmask8 k,_ml 28 a); 

VCVTTPS2DQ_m256i _mm256_cvttps_epi32 (_m256 a) 

C\/TTPS2DQ_ml 28i _mm_cvttps_epi32 (_ml 28 a) 

SIMD Floating-Point Exceptions 

Invalid, Precision 

Other Exceptions 

VEX-encoded instructions, see Exceptions Type 2; additionally 
EVEX-encoded instructions, see Exceptions Type E2. 

#UD If VEX.vvvv != llllB or EVEX.vvvv != llllB. 
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CVTTPSZPI—Convert with Truncation Packed Single-Precision FP Values to Packed Dword 
Integers 


Opcode/ 

Instruction 

Op/ 

En 

64-Bit 

Mode 

Compat/ 
Leg Mode 

Description 

OF 2C /r 

CVTTPS2PI mm, xmm/m64 

RM 

Valid 

Valid 

Convert two single-precision floating-point 
values from xmm/m64 to two signed 
doubleword signed integers in mm using 
truncation. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RM 

ModRM:reg (w) 

ModRM:r/m (r) 

NA 

NA 


Description 

Converts two packed single-precision floating-point values in the source operand (second operand) to two packed 
signed doubleword integers in the destination operand (first operand). The source operand can be an XMM register 
or a 64-bit memory location. The destination operand is an MMX technology register. When the source operand is 
an XMM register, the two single-precision floating-point values are contained in the low quadword of the register. 

When a conversion is inexact, a truncated (round toward zero) result is returned. If a converted result is larger 
than the maximum signed doubleword integer, the floating-point invalid exception is raised, and if this exception is 
masked, the indefinite integer value (80000000H) is returned. 

This instruction causes a transition from x87 FPU to MMX technology operation (that is, the x87 FPU top-of-stack 
pointer is set to 0 and the x87 FPU tag word is set to all Os [valid]). If this instruction is executed while an x87 FPU 
floating-point exception is pending, the exception is handled before the CVTTPS2PI instruction is executed. 

In 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15). 

Operation 

DEST[31:0] ^ Conuert_Single_Precision_Floating_Point_To_lnteger_Truncate(SRC[31:0]); 

DEST[63:32] ^ Convert_Single_Precision_Floating_Point_To_lnteger_Truncate(SRC[63:32]); 

Intel C/C++ Compiler Intrinsic Equivalent 

CVTTPS2PI: _m64 _mm_cvttps_pi32(_m128 a) 

SIMD Floating-Point Exceptions 

Invalid, Precision 

Other Exceptions 

See Table 22-5, "Exception Conditions for Legacy SIMD/MMX Instructions with XMM and FP Exception," in the 
Intel® 64 and IA-32 Architectures Software Developer's Manual, Volume 3B. 


CVTTPSZPI—Convert with Truncation Packed Single-Precision FP Values to Packed Dword Integers 


Vol.2A 3-273 














INSTRUCTION SET REFERENCE, A-L 


CVTTSDZSI—Convert with Truncation Scalar Double-Precision Floating-Point Value to Signed 
Integer 


Opcode/ 

Instruction 

Op/ 

En 

64/32 
bit Mode 
Support 

CPUID 

Feature 

Flag 

Description 

F2 OF 2C /r 

CVTTSD2SI r32, xmm1/m64 

RM 

V/V 

SSE2 

Convert one double-precision floating-point value from 
xmmi /m64 to one signed doubleword integer in r32 
using truncation. 

F2 REX.W 0F2C/r 

CVTTSD2SI r64, xmm1/m64 

RM 

V/N.E. 

SSE2 

Convert one double-precision floating-point value from 
xmm1/m64 to one signed quadword integer in r64 
using truncation. 

VEX.128.F2.0F.W0 2C/r 

VCVTTSD2SI r32, xmmi /m64 

RM 

V/V 

AVX 

Convert one double-precision floating-point value from 
xmm1/m64 to one signed doubleword integer in r32 
using truncation. 

VEX.128.F2.0F.W1 2C/r 

VCVTTSD2SI r64, xmmi /m64 

T1F 

V/N.E.' 

AVX 

Convert one double-precision floating-point value from 
xmmi /m64 to one signed quadword integer in r64 
using truncation. 

EVEX.LIG.F2.0F.W0 2C /r 

VCVTTSD2SI r32, xmmi /m64{sae] 

T1F 

V/V 

AVX512F 

Convert one double-precision floating-point value from 
xmm1/m64 to one signed doubleword integer in r32 
using truncation. 

EVEX.LIG.F2.0F.W1 2C/r 

VCVTTSD2SI r64, xmmi /m64{sae] 

T1F 

V/N.E.' 

AVX512F 

Convert one double-precision floating-point value from 
xmm1/m64 to one signed quadword integer in r64 
using truncation. 


NOTES: 

1. For this specific instruction, VEX.W/EVEX.W in non-64 bit is ignored; the instructions behaves as if the WO ver¬ 
sion is used. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RM 

ModRM:reg (w) 

ModRM:r/m (r) 

NA 

NA 

T1F 

ModRM:reg (w) 

ModRM:r/m (r) 

NA 

NA 


Description 

Converts a double-precision floating-point value in the source operand (the second operand) to a signed double- 
word integer (or signed quadword integer if operand size is 64 bits) in the destination operand (the first operand). 
The source operand can be an XMM register or a 64-bit memory location. The destination operand is a general 
purpose register. When the source operand is an XMM register, the double-precision floating-point value is 
contained in the low quadword of the register. 

When a conversion is inexact, the value returned is rounded according to the rounding control bits in the MXCSR 
register. 

If a converted result exceeds the range limits of signed doubleword integer (in non-64-bit modes or 64-bit mode 
with REX.W/VEX.W/EVEX.W=0), the floating-point invalid exception is raised, and if this exception is masked, the 
indefinite integer value (80000000FI) is returned. 

If a converted result exceeds the range limits of signed quadword integer (in 64-bit mode and 
REX.W/VEX.W/EVEX.W = 1), the floating-point invalid exception is raised, and if this exception is masked, the 
indefinite integer value (80000000_00000000FI) is returned. 

Legacy SSE instructions: In 64-bit mode. Use of the REX. W prefix promotes the instruction to 64-bit operation. See 
the summary chart at the beginning of this section for encoding data and limits. 

VEX.Wl and EVEX.Wl versions: promotes the instruction to produce 64-bit data in 64-bit mode. 

Note: VEX.vvvv and EVEX.vvvv are reserved and must be 1111b, otherwise instructions will #UD. 
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Software should ensure VCVTTSD2SI is encoded with VEX.L=0. Encoding VC\/TTSD2SI with VEX.L=1 may 
encounter unpredictable behavior across different processor generations. 

Operation 

(V)CVTTSD2SI (All versions) 

IF 64-Blt Mode and OperandSIze = 64 
THEN 

DEST[63:0] <- Convert_Double_Preclslon_Floating_Point_To_lnteger_Truncate(SRC[63:0]); 

ELSE 

DEST[31:0] <- Convert_Double_Precislon_Floating_Point_To_lnteger_Truncate(SRC[63:0]); 

FI; 

Intel C/C++ Compiler Intrinsic Equivalent 

VCVTTSD2SI Int_mm_cvttsdj32(_m128d a); 

\/C\/TTSD2SI lnt_mm_cvtt_roundsd_l32(_ml 28d a, int sae); 

VCVTTSD2SI _lnt64 _mm_cvttsdJ64( _m128d a); 

\/C\/TTSD2SI_Int64 _mm_cvtt_roundsd_l64(_ml 28d a, int sae); 

CVTTSD2SI int_mm_cvttsd_si32(_m128d a); 

CVTTSD2SI _int64 _mm_cvttsd_si64( _m128d a); 

SIMD Floating-Point Exceptions 

Invalid, Precision 

Other Exceptions 

VEX-encoded instructions, see Exceptions Type 3; additionally 
#UD If VEX.vvvv != llllB. 

EVEX-encoded instructions, see Exceptions Type E3NF. 


CVTTSD2SI—Convert with Truncation Scalar Double-Precision Floating-Point Value to Signed Integer 


Vol.2A 3-275 


INSTRUCTION SET REFERENCE, A-L 


CVTTSSZSI—Convert with Truncation Scalar Single-Precision Floating-Point Value to Integer 


Opcode/ 

Instruction 

Op/ 

En 

64/32 
bit Mode 
Support 

CPUID 

Feature 

Fiag 

Description 

F3 OF 2C /r 

CVTTSS2SI r32, xmm1/m32 

RM 

V/V 

SSE 

Convert one single-precision floating-point value from 
xmmi /m32 to one signed doubleword integer in r32 
using truncation. 

F3 REX.W 0F2C/r 

CVTTSS2SI r64, xmm1/m32 

RM 

V/N.E. 

SSE 

Convert one single-precision floating-point value from 
xmmi /m32 to one signed quadword integer in r64 
using truncation. 

VEX.128.F3.0F.W0 2C/r 

VCVTTSS2SI r32, xmmi /m32 

RM 

V/V 

AVX 

Convert one single-precision floating-point value from 
xmmi /m32 to one signed doubleword integer in r32 
using truncation. 

VEX.128.F3.0F.W1 2C/r 

VCVTTSS2SI r64, xmmi /m32 

RM 

V/N.E.' 

AVX 

Convert one single-precision floating-point value from 
xmmi /m32 to one signed quadword integer in r64 
using truncation. 

EVEX.LIG.F3.0F.W0 2C /r 

VCVTTSS2SI r32, xmmi /m32{sae} 

T1F 

V/V 

AVX512F 

Convert one single-precision floating-point value from 
xmmi /m32 to one signed doubleword integer in r32 
using truncation. 

EVEX.LIG.F3.0F.W1 2C/r 

VCVTTSS2SI r64, xmmi /m32{sae} 

T1F 

V/N.E.' 

AVX512F 

Convert one single-precision floating-point value from 
xmmi /m32 to one signed quadword integer in r64 
using truncation. 


NOTES: 

1. For this specific instruction, VEX.W/EVEX.W in non-64 bit is ignored; the instructions behaves as if the WO ver¬ 
sion is used. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RM 

ModRM:reg (w) 

ModRM:r/m (r) 

NA 

NA 

T1F 

ModRM:reg (w) 

ModRM:r/m (r) 

NA 

NA 


Description 

Converts a single-precision floating-point value in the source operand (the second operand) to a signed doubleword 
integer (or signed quadword integer if operand size is 64 bits) in the destination operand (the first operand). The 
source operand can be an XMM register or a 32-bit memory location. The destination operand is a general purpose 
register. When the source operand is an XMM register, the single-precision floating-point value is contained in the 
low doubleword of the register. 

When a conversion is inexact, a truncated (round toward zero) result is returned. If a converted result is larger than 
the maximum signed doubleword integer, the floating-point invalid exception is raised. If this exception is masked, 
the indefinite integer value (80000000FI or 80000000_00000000FI if operand size is 64 bits) is returned. 

Legacy SSE instructions: In 64-bit mode. Use of the REX. W prefix promotes the instruction to 64-bit operation. See 
the summary chart at the beginning of this section for encoding data and limits. 

VEX.Wl and EVEX.Wl versions: promotes the instruction to produce 64-bit data in 64-bit mode. 

Note: VEX.vvvv and EVEX.vvvv are reserved and must be 1111b, otherwise instructions will #UD. 

Software should ensure VCVTTSS2SI is encoded with VEX.L=0. Encoding VCVTTSS2SI with VEX.L=1 may 
encounter unpredictable behavior across different processor generations. 
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Operation 

(V)CVTTSS2SI (All versions) 

IF 64-Bit Mode and OperandSIze = 64 
THEN 

DEST[63:0] <- Convert_Single_Precislon_Floating_Polnt_To_lnteger_Truncate(SRC[31:0]); 
ELSE 

DEST[31:0] <- Convert_Single_Precislon_Floating_Polnt_To_lnteger_Truncate(SRC[31:0]); 
FI; 

Intel C/C++ Compiler Intrinsic Equivalent 

VCVTTSS2SI Int_mm_cvttssj32(_m128 a); 

\/C\/TTSS2SI lnt_mm_cvtt_roundss_l32(_ml 28 a, Int sae); 

VCVTTSS2SI _lnt64 _mm_cvttssJ64( _m128 a); 

VCVTTSSZSI_Int64 _mm_cvtt_roundss_l64(_ml 28 a, Int sae); 

CVTTSS2SI int_mm_cvttss_si32(_m128 a); 

CVTTSS2SI _lnt64 _mm_cvttss_sl64( _m128 a); 

SIMD Floating-Point Exceptions 

Invalid, Precision 

Other Exceptions 

See Exceptions Type 3; additionally 
#UD If VEX.vvvv != llllB. 

EVEX-encoded instructions, see Exceptions Type E3NF. 
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CWD/CDQ/CQO—Convert Word to Doubleword/Convert Doubleword to Quadword 


Opcode 

Instruction 

Op/ 

En 

64-Bit 

Mode 

Compat/ 
Leg Mode 

Description 

99 

CWD 

NP 

Valid 

Valid 

DX:AX ^ sign-extend of AX. 

99 

CDQ 

NP 

Valid 

Valid 

EDX:EAX ^ sign-extend of EAX. 

REX.W + 99 

CQO 

NP 

Valid 

N.E. 

RDX:RAX<- sign-extend of RAX. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

NP 

NA 

NA 

NA 

NA 


Description 

Doubles the size of the operand in register AX, EAX, or RAX (depending on the operand size) by means of sign 
extension and stores the result in registers DX:AX, EDX:EAX, or RDX:RAX, respectively. The CWD instruction 
copies the sign (bit 15) of the value in the AX register into every bit position in the DX register. The CDQ instruction 
copies the sign (bit 31) of the value in the EAX register into every bit position in the EDX register. The CQO instruc¬ 
tion (available in 64-bit mode only) copies the sign (bit 63) of the value in the RAX register into every bit position 
in the RDX register. 

The CWD instruction can be used to produce a doubleword dividend from a word before word division. The CDQ 
instruction can be used to produce a quadword dividend from a doubleword before doubleword division. The CQO 
instruction can be used to produce a double quadword dividend from a quadword before a quadword division. 

The CWD and CDQ mnemonics reference the same opcode. The CWD instruction is intended for use when the 
operand-size attribute is 16 and the CDQ instruction for when the operand-size attribute is 32. Some assemblers 
may force the operand size to 16 when CWD is used and to 32 when CDQ is used. Others may treat these 
mnemonics as synonyms (CWD/CDQ) and use the current setting of the operand-size attribute to determine the 
size of values to be converted, regardless of the mnemonic used. 

In 64-bit mode, use of the REX.W prefix promotes operation to 64 bits. The CQO mnemonics reference the same 
opcode as CWD/CDQ. See the summary chart at the beginning of this section for encoding data and limits. 

Operation 

IF OperandSIze = 16 (* CWD Instruction *) 

THEN 

DX ^ SlgnExtend(AX); 

ELSE IF OperandSIze = 32 (* CDQ instruction *) 

EDX ^ SignExtend(EAX); FI; 

ELSE IF 64-Bit Mode and OperandSIze = 64 (* CQO instruction*) 

RDX ^ SignExtend(RAX); FI; 

FI; 

Flags Affected 

None 

Exceptions (All Operating Modes) 

#UD If the LOCK prefix is used. 
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DAA—Decimal Adjust AL after Addition 


Opcode 

Instruction 

Op/ 

En 

64-Bit 

Mode 

Compat/ 
Leg Mode 

Description 

27 

DAA 

NP 

Invalid 

Valid 

Decimal adjust AL after addition. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

NP 

NA 

NA 

NA 

NA 


Description 

Adjusts the sum of two packed BCD values to create a packed BCD result. The AL register is the implied source and 
destination operand. The DAA instruction is only useful when it follows an ADD instruction that adds (binary addi¬ 
tion) two 2-digit, packed BCD values and stores a byte result in the AL register. The DAA instruction then adjusts 
the contents of the AL register to contain the correct 2-digit, packed BCD result. If a decimal carry is detected, the 
CF and AF flags are set accordingly. 

This instruction executes as described above in compatibility mode and legacy mode. It is not valid in 64-bit mode. 


Operation 

IF 64-Blt Mode 
THEN 
#UD; 

ELSE 

old_AL ^ AL; 
old_CF ^ CF; 

CF^O; 

IF (((AL AND OFH) > 9) orAF = 1) 

THEN 

AL ^ AL + 6; 

CF ^ old_CF or (Carry from AL ^ AL + 6); 
AF^ 1; 

ELSE 

AF ^ 0; 

FI; 

IF ((old_AL > 99H) or (old_CF = 1)) 

THEN 

AL ^ AL + 60H; 

CF^ 1; 

ELSE 

CF^O; 

FI; 

FI; 


Example 

ADD AL, BL Before: AL=79H BL=35H EFLAGS(OSZAPC)=XXXXXX 
After: AL=AEH BL=35H EFLAGS(0SZAPC)=110000 
DAA Before: AL=AEH BL=35H EFLAGS(OSZAPC)=110000 

After: AL=14H BL=35H EFLAGS(0SZAPC)=X00111 
DAA Before: AL=2EH BL=35H EFLAGS(0SZAPC)=110000 

After: AL=34H BL=35H EFLAGS(0SZAPC)=X00101 
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Flags Affected 

The CF and AF flags are set if the adjustment of the value results in a decimal carry in either digit of the result (see 
the "Operation" section above). The SF, ZF, and PF flags are set according to the result. The OF flag is undefined. 

Protected Mode Exceptions 

#UD If the LOCK prefix is used. 

Real-Address Mode Exceptions 

#UD If the LOCK prefix is used. 

Virtual-SOSe Mode Exceptions 

#UD If the LOCK prefix is used. 

Compatibility Mode Exceptions 

#UD If the LOCK prefix is used. 

64-Bit Mode Exceptions 

#UD If in 64-bit mode. 
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DAS—Decimal Adjust AL after Subtraction 


Opcode 

Instruction 

Op/ 

En 

64-Bit 

Mode 

Compat/ 
Leg Mode 

Description 

2F 

DAS 

NP 

Invalid 

Valid 

Decimal adjust AL after subtraction. 


Instruction Operand 

Encoding 

Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

NP 

NA 

NA 

NA 

NA 


Description 

Adjusts the result of the subtraction of two packed BCD values to create a packed BCD result. The AL register is the 
implied source and destination operand. The DAS instruction is only useful when it follows a SUB instruction that 
subtracts (binary subtraction) one 2-digit, packed BCD value from another and stores a byte result in the AL 
register. The DAS instruction then adjusts the contents of the AL register to contain the correct 2-digit, packed BCD 
result. If a decimal borrow is detected, the CF and AF flags are set accordingly. 

This instruction executes as described above in compatibility mode and legacy mode. It is not valid in 64-bit mode. 

Operation 

IF 64-Blt Mode 
THEN 
#UD; 

ELSE 

old_AL ^ AL; 
old_CF ^ CF; 

CF^O; 

IF (((AL AND OFH) > 9) orAF= 1) 

THEN 

AL ^ AL - 6; 

CF <- old_CF or (Borrow from AL <- AL - 6); 

AF^ 1; 

ELSE 

AF^O; 

FI; 

IF ((old_AL > 99H) or (old_CF = 1)) 

THEN 

AL^AL-60H; 

CF^ 1; 

FI; 

FI; 

Example 

SUB AL,BL Before: AL = 35H,BL = 47H,EFLAGS(0SZAPC) = XXXXXX 
After: AL = EEH, BL = 47H, EFLAGS(OSZAPC) = 010111 
DAA Before: AL = EEH, BL = 47H, EFLAGS(OSZAPC) = 010111 

After: AL = 88H, BL = 47H, EFLAGS(OSZAPC) = XI0111 


Flags Affected 

The CF and AF flags are set if the adjustment of the value results in a decimal borrow in either digit of the result 
(see the "Operation" section above). The SF, ZF, and PF flags are set according to the result. The OF flag is unde¬ 
fined. 
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Protected Mode Exceptions 

#UD If the LOCK prefix is used. 

Real-Address Mode Exceptions 

#UD If the LOCK prefix is used. 

Virtual-SOSe Mode Exceptions 

#UD If the LOCK prefix is used. 

Compatibility Mode Exceptions 

#UD If the LOCK prefix is used. 

e4-Bit Mode Exceptions 

#UD If in 64-bit mode. 
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DEC—Decrement by 1 


Opcode 

Instruction 

Op/ 

En 

64-Bit 

Mode 

Compat/ 
Leg Mode 

Description 

FE/1 

DEC r/mS 

M 

Valid 

Valid 

Decrement r/mS by 1. 

REX + FE /I 

DEC r/mS 

M 

Valid 

N.E. 

Decrement r/mS by 1. 

FF/1 

DEC r/m 16 

M 

Valid 

Valid 

Decrement r/m 7 6 by 1. 

FF/1 

DEC r/m32 

M 

Valid 

Valid 

Decrement r/m32 by 1. 

REX.W + FF /I 

DEC r/m64 

M 

Valid 

N.E. 

Decrement r/m64 by 1. 

48+rw 

DEC r16 

0 

N.E. 

Valid 

Decrement r7 6 by 1. 

48+rd 

DEC r32 

0 

N.E. 

Valid 

Decrement r32 by 1. 


NOTES: 

* In 64-blt mode, r/m8 can not be encoded to access the following byte registers If a REX prefix is used: AH, BH, CH, DH. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

M 

ModRM:r/m (r, w) 

NA 

NA 

NA 

0 

opcode + rd (r, w) 

NA 

NA 

NA 


Description 

Subtracts 1 from the destination operand, while preserving the state of the CF flag. The destination operand can be 
a register or a memory location. This instruction allows a loop counter to be updated without disturbing the CF flag. 
(To perform a decrement operation that updates the CF flag, use a SUB instruction with an immediate operand of 
1 .) 

This instruction can be used with a LOCK prefix to allow the instruction to be executed atomically. 

In 64-bit mode, DEC rl6 and DEC r32 are not encodable (because opcodes 48H through 4FH are REX prefixes). 
Otherwise, the instruction's 64-bit mode default operation size is 32 bits. Use of the REX.R prefix permits access to 
additional registers (R8-R15). Use of the REX.W prefix promotes operation to 64 bits. 

See the summary chart at the beginning of this section for encoding data and limits. 

Operation 

DEST^DEST-1; 


Flags Affected 

The CF flag is not affected. The OF, SF, ZF, AF, and PF flags are set according to the result. 


Protected Mode Exceptions 


#GP(0) 


#SS(0) 

#PF(fault-code) 

#AC(0) 

#UD 


If the destination operand is located in a non-writable segment. 

If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 
If the DS, ES, FS, or GS register contains a NULL segment selector. 

If a memory operand effective address is outside the SS segment limit. 

If a page fault occurs. 

If alignment checking is enabled and an unaligned memory reference is made while the 
current privilege level is 3. 

If the LOCK prefix is used but the destination is not a memory operand. 
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Real-Address Mode 

#GP 

#SS 

#UD 


Exceptions 

If a memory operand effective address is outside the CS, DS, ES, FS, or 
If a memory operand effective address is outside the SS segment limit. 
If the LOCK prefix is used but the destination is not a memory operand. 


GS segment limit. 


Virtual-SOSe Mode 

#GP(0) 

#SS(0) 

#PF(fault-code) 

#AC(0) 

#UD 


Exceptions 

If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 
If a memory operand effective address is outside the SS segment limit. 

If a page fault occurs. 

If alignment checking is enabled and an unaligned memory reference is made. 

If the LOCK prefix is used but the destination is not a memory operand. 


Compatibility Mode Exceptions 

Same exceptions as in protected mode. 


e4-Bit Mode Exceptions 

#SS(0) If a memory address referencing the SS segment is in a non-canonical form. 

#GP(0) If the memory address is in a non-canonical form. 

#PF(fault-code) If a page fault occurs. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the 

current privilege level is 3. 

#UD If the LOCK prefix is used but the destination is not a memory operand. 
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DIV—Unsigned Divide 


Opcode 

Instruction 

Op/ 

En 

64-Bit 

Mode 

Compat/ 
Leg Mode 

Description 

F6 /6 

DIV r/mS 

M 

Valid 

Valid 

Unsigned divide AX by r/mS, with result 
stored in AL Quotient, AH <- Remainder. 

REX + F6 /6 

DIV r/mS 

M 

Valid 

N.E. 

Unsigned divide AX by r/mS, with result 
stored in AL Quotient, AH <- Remainder. 

F7 /6 

DIV r/m 7 6 

M 

Valid 

Valid 

Unsigned divide DX:AX by r/ml6, with result 
stored in AX <- Quotient, DX <- Remainder. 

F7 /6 

DIV r/m32 

M 

Valid 

Valid 

Unsigned divide EDX:EAX by r/m32, with 
result stored in EAX <- Quotient, EDX <- 
Remainder. 

REX.W + F7 /6 

DIV r/m64 

M 

Valid 

N.E. 

Unsigned divide RDX:RAX by r/m64, with 
result stored in RAX <- Quotient, RDX ^ 
Remainder. 


NOTES: 

* In 64-blt mode, r/m8 can not be encoded to access the following byte registers If a REX prefix is used: AH, BH, CH, DH. 


Instruction Operand 

Encoding 

Op/En 

Dperand 1 

Dperand 2 

Dperand 3 

Operand 4 

M 

ModRM:r/m (w) 

NA 

NA 

NA 


Description 

Divides unsigned the value in the AX, DX:AX, EDX:EAX, or RDX:RAX registers (dividend) by the source operand 
(divisor) and stores the result in the AX (AH:AL), DX:AX, EDX:EAX, or RDX:RAX registers. The source operand can 
be a general-purpose register or a memory location. The action of this instruction depends on the operand size 
(dividend/divisor). Division using 64-bit operand is available only in 64-bit mode. 

Non-integral results are truncated (chopped) towards 0. The remainder is always less than the divisor in magni¬ 
tude. Overflow is indicated with the #DE (divide error) exception rather than with the CF flag. 

In 64-bit mode, the instruction's default operation size is 32 bits. Use of the REX.R prefix permits access to addi¬ 
tional registers (R8-R15). Use of the REX.W prefix promotes operation to 64 bits. In 64-bit mode when REX.W is 
applied, the instruction divides the unsigned value in RDX:RAX by the source operand and stores the quotient in 
RAX, the remainder in RDX. 

See the summary chart at the beginning of this section for encoding data and limits. See Table 3-15. 


Table 3-15. DIV Action 


Operand Size 

Dividend 

Divisor 

Quotient 

Remainder 

Maximum 

Quotient 

Word/byte 

AX 

r/m8 

AL 

AH 

255 

Doubleword/word 

DX:AX 

r/m 16 

AX 

DX 

65,535 

Quadword/doubleword 

EDX:EAX 

r/m32 

EAX 

EDX 

232-1 

Doublequadword/ 

quadword 

RDX:RAX 

r/m64 

RAX 

RDX 

264 _ I 
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Operation 

IF SRC = 0 

THEN #DE; FI; (* Divide Error *) 

IF OperandSIze = 8 (* Word/Byte Operation *) 

THEN 

temp ^ AX / SRC; 

IF temp > FFH 

THEN #DE; (* Divide error *) 

ELSE 

AL temp; 

AH ^ AX MOD SRC; 

FI; 

ELSE IF OperandSIze = 16 (* Doubleword/word operation *) 

THEN 

temp ^ DX:AX / SRC; 

IF temp > FFFFH 

THEN #DE; (* Divide error *) 

ELSE 

AX <- temp; 

DX ^ DX:AX MOD SRC; 

FI; 

FI; 

ELSE IF OperandSIze = 32 (* Quadword/doubleword operation *) 

THEN 

temp ^ EDX:EAX / SRC; 

IF temp > FFFFFFFFH 

THEN #DE; (* Divide error *) 

ELSE 

EAX temp; 

EDX ^ EDX:EAX MOD SRC; 

FI; 

FI; 

ELSE IF 64-Bit Mode and Operandsize = 64 (* Doublequadword/quadword operation *) 
THEN 

temp ^ RDX:RAX / SRC; 

IF temp > FFFFFFFFFFFFFFFFH 
THEN #DE; (* Divide error *) 

ELSE 

RAX ^ temp; 

RDX ^ RDX:RAX MOD SRC; 

FI; 

FI; 

FI; 

Flags Affected 

The CF, OF, SF, ZF, AF, and PF flags are undefined. 
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Protected Mode Exceptions 

#DE If the source operand (divisor) is 0 


#GP(0) 

If the quotient is too large for the designated register. 

If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

If the DS, ES, FS, or GS register contains a NULL segment selector. 

#SS(0) 

#PF(fault-code) 

#AC(0) 

If a memory operand effective address is outside the SS segment limit. 

If a page fault occurs. 

If alignment checking is enabled and an unaligned memory reference is made while the 
current privilege level is 3. 

#UD 

If the LOCK prefix is used. 


Real-Address Mode Exceptions 

#DE If the source operand (divisor) is 0 


#GP 

If the quotient is too large for the designated register. 

If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

If the DS, ES, FS, or GS register contains a NULL segment selector. 

#SS(0) 

#UD 

If a memory operand effective address is outside the SS segment limit. 

If the LOCK prefix is used. 


Virtual-SOSe Mode Exceptions 

#DE If the source operand (divisor) is 0 


#GP(0) 

#SS 

#PF(fault-code) 

#AC(0) 

#UD 

If the quotient is too large for the designated register. 

If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

If a memory operand effective address is outside the SS segment limit. 

If a page fault occurs. 

If alignment checking is enabled and an unaligned memory reference is made. 

If the LOCK prefix is used. 


Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

e4-Bit Mode Exceptions 

#SS(0) If a memory address referencing the SS segment is in a non-canonical form 


#GP(0) 

#DE 

If the memory address is in a non-canonical form. 

If the source operand (divisor) is 0 

If the quotient is too large for the designated register. 

#PF(fault-code) 

#AC(0) 

If a page fault occurs. 

If alignment checking is enabled and an unaligned memory reference is made while the 
current privilege level is 3. 

#UD 

If the LOCK prefix is used. 
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DIVPD—Divide Packed Double-Precision Floating-Point Values 


Opcode/ 

Instruction 

Op/ 

En 

64/32 
bit Mode 
Support 

CPUID 

Feature 

Flag 

Description 

66 0F5E/r 

DIVPD xmmi, xmm2/m128 

RM 

V/V 

SSE2 

Divide packed double-precision floating-point values 
in xmmi by packed double-precision floating-point 
values in xmm2/mem. 

VEX.NDS.128.66.0F.WIG5E/r 

VDIVPD xmmi, xmm2, xmm3/m128 

RVM 

v/v 

AVX 

Divide packed double-precision floating-point values 
in xmm2 by packed double-precision floating-point 
values in xmm3/mem. 

VEX.NDS.256.66.0F.WIG 5E /r 

VDIVPD ymmi, ymm2, ymm3/m256 

RVM 

V/V 

AVX 

Divide packed double-precision floating-point values 
in ymm2 by packed double-precision floating-point 
values in ymm3/mem. 

EVEX.NDS.128.66.0F.W1 5E/r 

VDIVPD xmmi [kl }[z], xmm2, 
xmm3/m128/m64bcst 

FV 

v/v 

AVX512VL 

AVX512F 

Divide packed double-precision floating-point values 
in xmm2 by packed double-precision floating-point 
values in xmm3/m128/m64bcst and write results to 
xmmi subject to writemask kl. 

EVEX.NDS.256.66.0F.W1 5E /r 

VDIVPD ymmi {k1}{z},ymm2, 
ymm3/m256/m64bcst 

FV 

v/v 

AVX512VL 

AVX512F 

Divide packed double-precision floating-point values 
in ymm2 by packed double-precision floating-point 
values in ymm3/m256/m64bcst and write results to 
ymmi subject to writemask kl. 

EVEX.NDS.51 2.66.0F.W1 5E /r 

VDIVPD zmmi {k1}{z}, zmm2, 
zmm3/m512/m64bcst{er} 

FV 

v/v 

AVX512F 

Divide packed double-precision floating-point values 
in zmm2 by packed double-precision FP values in 
zmm3/m512/m64bcst and write results to zmmi 
subject to writemask kl. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RM 

ModRM:reg (r, w) 

ModRM:r/m (r) 

NA 

NA 

RVM 

ModRM:reg (w) 

VEX.vvvv 

ModRM:r/m (r) 

NA 

FV 

ModRM:reg (w) 

EVEX.vvvv 

ModRM:r/m (r) 

NA 


Description 

Performs a SIMD divide of the double-precision floating-point values in the first source operand by the floating¬ 
point values in the second source operand (the third operand). Results are written to the destination operand (the 
first operand). 

EVEX encoded versions: The first source operand (the second operand) is a ZMM/YMM/XMM register. The second 
source operand can be a ZMM/YMM/XMM register, a 512/256/128-bit memory location or a 512/256/128-bit vector 
broadcasted from a 64-bit memory location. The destination operand is a ZMM/YMM/XMM register conditionally 
updated with writemask kl. 

VEX.256 encoded version: The first source operand (the second operand) is a YMM register. The second source 
operand can be a YMM register or a 256-bit memory location. The destination operand is a YMM register. The upper 
bits (MAX_VL-1:256) of the corresponding destination are zeroed. 

VEX.128 encoded version: The first source operand (the second operand) is a XMM register. The second source 
operand can be a XMM register or a 128-bit memory location. The destination operand is a XMM register. The upper 
bits (MAX_VL-1:128) of the corresponding destination are zeroed. 

128-bit Legacy SSE version: The second source operand (the second operand) can be an XMM register or an 128- 
bit memory location. The destination is the same as the first source operand. The upper bits (MAX_VL-1:128) of the 
corresponding destination are unmodified. 
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Operation 

VDIVPD (EVEX encoded versions) 

(KL, VL) = (2,128), (4, 256), (8, 512) 

IF (VL = 512) AND (EVEX.b = 1) AND SRC2 *ls a register* 

THEN 

SET_RM(EVEX.RC); ; refer to Table 2-4 In the Inter Architecture Instruction Set Extensions Programming Reference 

ELSE 

SET_RM(MXCSR.RM); 

FI; 

FORj^OTO KL-1 
I ^ j * 64 

IF k10] OR *no writemask* 

THEN 

IF (EVEX.b = 1) AND (SRC2 *is memory*) 

THEN 

DEST[I+63:I] ^ SRC1 [1+63:1] / SRC2[63:0] 

ELSE 

DEST[I+63:I] ^ SRC1 [1+63:1] / SRC2[l+63:i] 

FI; 

ELSE 

IF *merglng-masking* ; merging-masking 

THEN *DEST[i+63:i] remains unchanged* 

ELSE ; zeroing-masking 

DEST[i+63:i] ^ 0 
FI 
FI; 

ENDFOR 

DEST[MAX_VL-1:VL]^0 


VDIVPD (VEX.256 encoded version) 

DEST[63:0] ^SRCI [63:0] / SRC2[63:0] 

DEST[127:64] ^SRCI [127:64] / SRC2[127:64] 
DEST[191:128] ^SRCI [191:128] / SRC2[191:128] 
DEST[255:192] ^SRCI [255:192] / SRC2[255:192] 
DEST[MAX_VL-1:256] ^0; 


VDIVPD (VEX.128 encoded version) 

DEST[63:0] ^SRCI [63:0] / SRC2[63:0] 

DEST[127:64] ^SRCI [127:64] / SRC2[127:64] 
DEST[MAX_VL-1:128] ^0; 

DIVPD (128-bit Legacy SSE version) 

DEST[63:0] ^SRCI [63:0] / SRC2[63:0] 

DEST[127:64] ^SRCI [127:64] / SRC2[127:64] 
DEST[MAX_VL-1:128] (Unmodified) 
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Intel C/C++ Compiler Intrinsic Equivalent 

VDIVPD _m512d _mm512_dlv_pd( _m512d a, _m512d b); 

VDIVPD_mSI 2d_mm512_masl<_dlv_pd(_mSI 2d s,_mmaskS k,_mSI 2d a,_mSI 2d b); 

VDIVPD_mSI 2d_mm512_maskz_div_pd(_mmaskS k,_mSI 2d a,_mSI 2d b); 

VDIVPD_m256d _mm256_mask_dlv_pd(_m256d s,_mmaskS k,_m256d a,_m256d b); 

VDIVPD_m256d _mm256_maskz_div_pd(_mmaskS k,_m256d a,_m256d b); 

VDIVPD_ml 2Sd _mm_mask_div_pd(_ml 2Sd s,_mmaskS k,_ml 2Sd a,_ml 2Sd b); 

VDIVPD_ml 2Sd _mm_maskz_dlv_pd(_mmaskS k,_m12Sd a,_m12Sd b); 

VDIVPD _m512d _mm512_dlv_round_pd( _m512d a, _m512d b, int); 

VDIVPD_mSI 2d _mm512_mask_dlv_round_pd(_mSI 2d s,_mmaskS k,_mSI 2d a,_mSI 2d b, Int); 

VDIVPD_mSI 2d _mm512_maskz_div_round_pd(_mmaskS k,_m512d a,_mSI 2d b, int); 

VDIVPD _m256d _mm256_dlv_pd (_m256d a, _m256d b); 

DIVPD _m12Sd _mm_dlv_pd (_m12Sd a, _m12Sd b); 

SIMD Floating-Point Exceptions 

Overflow, Underflow, Invalid, Divide-by-Zero, Precision, Denormal 

Other Exceptions 

VEX-encoded instructions, see Exceptions Type 2. 

EVEX-encoded instructions, see Exceptions Type E2. 
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DIVPS—Divide Packed Single-Precision Floating-Point Values 


Opcode/ 

Instruction 

Op/ 

En 

64/32 
bit Mode 
Support 

CPUID 

Feature 

Flag 

Description 

OF 5E /r 

DIVPS xmmi, xmm2/m128 

RM 

V/V 

SSE 

Divide packed single-precision floating-point values 
in xmmi by packed single-precision floating-point 
values in xmm2/mem. 

VEX.NDS.128.0F.WIG5E/r 

VDIVPS xmmi, xmm2, xmm3/m128 

RVM 

v/v 

AVX 

Divide packed single-precision floating-point values 
in xmm2 by packed single-precision floating-point 
values in xmm3/mem. 

VEX.NDS.256.0F.WIG 5E /r 

VDIVPS ymmi, ymm2, ymm3/m256 

RVM 

V/V 

AVX 

Divide packed single-precision floating-point values 
in ymm2 by packed single-precision floating-point 
values in ymm3/mem. 

EVEX.NDS.128.0F.W0 5E /r 

VDIVPS xmmi {k1}{z}, xmm2, 
xmm3/m128/m32bcst 

FV 

v/v 

AVX512VL 

AVX512F 

Divide packed single-precision floating-point values 
in xmm2 by packed single-precision floating-point 
values in xmm3/m128/m32bcst and write results to 
xmmi subject to writemask kl. 

EVEX.NDS.256.0F.W0 5E /r 

VDIVPS ymmi {k1}{z}, ymm2, 
ymm3/m256/m32bcst 

FV 

v/v 

AVX512VL 

AVX512F 

Divide packed single-precision floating-point values 
in ymm2 by packed single-precision floating-point 
values in ymm3/m256/m32bcst and write results to 
ymmi subject to writemask kl. 

EVEX.NDS.512.0F.W0 5E/r 

VDIVPS zmmi (k1}[z}, zmm2, 
zmm3/m512/m32bcst[er} 

FV 

v/v 

AVX512F 

Divide packed single-precision floating-point values 
in zmm2 by packed single-precision floating-point 
values in zmm3/m512/m32bcst and write results to 
zmmi subject to writemask kl. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RM 

ModRM:reg (r, w) 

ModRM:r/m (r) 

NA 

NA 

RVM 

ModRM:reg (w) 

VEX.vvvv 

ModRM:r/m (r) 

NA 

FV 

ModRM:reg (w) 

EVEX.vvvv 

ModRM:r/m (r) 

NA 


Description 

Performs a SIMD divide of the four, eight or sixteen packed single-precision floating-point values in the first source 
operand (the second operand) by the four, eight or sixteen packed single-precision floating-point values in the 
second source operand (the third operand). Results are written to the destination operand (the first operand). 

EVEX encoded versions: The first source operand (the second operand) is a ZMM/YMM/XMM register. The second 
source operand can be a ZMM/YMM/XMM register, a 512/256/128-bit memory location or a 512/256/128-bit vector 
broadcasted from a 32-bit memory location. The destination operand is a ZMM/YMM/XMM register conditionally 
updated with writemask kl. 

VEX.256 encoded version: The first source operand is a YMM register. The second source operand can be a YMM 
register or a 256-bit memory location. The destination operand is a YMM register. 

VEX. 128 encoded version: The first source operand is a XMM register. The second source operand can be a XMM 
register or a 128-bit memory location. The destination operand is a XMM register. The upper bits (MAX_VL-1:128) 
of the corresponding ZMM register destination are zeroed. 

128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti¬ 
nation is not distinct from the first source XMM register and the upper bits (MAX_VL-1:128) of the corresponding 
ZMM register destination are unmodified. 
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Operation 

VDIVPS (EVEX encoded versions) 

(KL, VL) = (4,128), (8, 256), (16, 512) 

IF (VL = 512) AND (EVEX.b = 1) AND SRC2 *ls a register* 

THEN 

SET_RM(EVEX.RC); 

ELSE 

SET_RM(MXCSR.RM); 

FI; 

FOR] ^0 TO KL-1 
i^j*32 

IF k1 [j] OR *no writemask* 

THEN 

IF (EVEX.b = 1) AND (SRC2 *ls memory*) 

THEN 

DEST[i+31 :l] ^ SRC1 [1+31:1] / SRC2[31:0] 

ELSE 

DEST[i+31 :i] ^ SRC1 [1+31:1] / SRC2[I+31 :l] 

FI; 

ELSE 

IF *merglng-masklng* ; mergIng-maskIng 

THEN *DEST[I+31 :l] remains unchanged* 

ELSE ; zeroing-masking 

DEST[i+31:i]^0 
FI 
FI; 

ENDFOR 

DEST[MAX_VL-1:VL]^0 


VDIVPS (VEX.256 encoded version) 

DEST[31:0] ^SRCI [31:0] / SRC2[31:0] 

DEST[63:32] ^SRCI [63:32] / SRC2[63:32] 
DEST[95:64] ^SRCI [95:64] / SRC2[95:64] 

DEST[127:96] ^SRCI [127:96] / SRC2[127:96] 
DEST[159:128] ^SRCI [159:128] / SRC2[159:128] 
DEST[191:160]^SRC1 [191:160] / SRC2[191:160] 
DEST[223:192] ^SRCI [223:192] / SRC2[223:192] 
DEST[255:224] ^SRCI [255:224] / SRC2[255:224]. 
DEST[MAX_VL-1:256] ^0; 


VDIVPS (VEX.128 encoded version) 

DEST[31:0] ^SRCI [31:0] / SRC2[31:0] 
DEST[63:32] ^SRCI [63:32] / SRC2[63:32] 
DEST[95:64] ^SRCI [95:64] / SRC2[95:64] 
DEST[127:96] ^SRCI [127:96] / SRC2[127:96] 
DEST[MAX_VL-1:128] ^0 
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DIVPS (1 Z8-bit Legacy SSE version) 

DEST[31:0] ^SRCI [31:0] / SRC2[31:0] 

DEST[63:32] ^SRCI [63:32] / SRC2[63:32] 

DEST[95:64] ^SRCI [95:64] / SRC2[95:64] 

DEST[127:96] ^SRCI [127:96] / SRC2[127:96] 

DEST[MAX_VL-1:128] (Unmodified) 

Intel C/C++ Compiler Intrinsic Equivalent 

VDIVPS _m512 _mm512_dlv_ps( _m512 a, _m512 b); 

VDIVPS_m512 _mm512_masl<_dlv_ps(_m512 s,_mmask16 k,_m512 a,_m512 b); 

VDIVPS m512 _mm512_maskz_div_ps( mmask16 k, m512 a, m512 b); 

VDIVPD m256d _mm256_mask_dlv_pd( m256d s, mmaskS k, m256d a, m256d b); 

VDIVPD_m256d _mm256_maskz_dlv_pd(_mmaskS k,_m256d a,_m256d b); 

VDIVPD_ml 28d _mm_mask_dlv_pd(_ml 28d s,_mmask8 k,_ml 28d a,_ml 28d b); 

VDIVPD_ml 28d _mm_maskz_dlv_pd(_mmask8 k,_m128d a,_ml 28d b); 

VDIVPS_m512 _mm512_dlv_round_ps(_m512 a,_m512 b, Int); 

VDIVPS_m512 _mm512_mask_dlv_round_ps(_m512 s,_mmask16 k,_m512 a,_m512 b, Int); 

VDIVPS_m512 _mm512_maskz_div_round_ps(_mmaski 6 k,_m512 a,_m512 b, Int); 

VDIVPS _m256 _mm256_dlv_ps (_m256 a, _m256 b); 

DIVPS_ml 28_mm_dlvj5S (_ml 28 a,_ml 28 b); 

SIMD Floating-Point Exceptions 

Overflow, Underflow, Invalid, Divide-by-Zero, Precision, Denormal 

Other Exceptions 

VEX-encoded instructions, see Exceptions Type 2. 

EVEX-encoded instructions, see Exceptions Type E2. 
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DIVSD—Divide Scalar Double-Precision Floating-Point Value 


Opcode/ 

Instruction 

Op/ 

En 

64/32 
bit Mode 
Support 

CPUiD 

Feature 

Fiag 

Description 

F2 OF 5E /r 

DIVSD xmmi, xmm2/m64 

RM 

V/V 

SSE2 

Divide low double-precision floating-point value in 
xmmi by low double-precision floating-point value 
in xmm2/m64. 

VEX.NDS.128.F2.0F.WIG5E/r 

VDIVSD xmmi, xmm2, xmm3/m64 

RVM 

v/v 

AVX 

Divide low double-precision floating-point value in 
xmm2 by low double-precision floating-point value 
in xmm3/m64. 

EVEX.NDS.LIG.F2.0F.W1 5E/r 

VDIVSD xmmi {k1 }[z], xmm2, 
xmm3/m64[er} 

T1S 

V/V 

AVX512F 

Divide low double-precision floating-point value in 
xmm2 by low double-precision floating-point value 
in xmm3/m64. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RM 

ModRM:reg (r, w) 

ModRM:r/m (r) 

NA 

NA 

RVM 

ModRM:reg (w) 

VEX.vvvv 

ModRM:r/m (r) 

NA 

T1S 

ModRM:reg (w) 

EVEX.vvvv 

ModRM:r/m (r) 

NA 


Description 

Divides the low double-precision floating-point value in the first source operand by the low double-precision 
floating-point value in the second source operand, and stores the double-precision floating-point result in the desti¬ 
nation operand. The second source operand can be an XMM register or a 64-bit memory location. The first source 
and destination are XMM registers. 

128-bit Legacy SSE version: The first source operand and the destination operand are the same. Bits (MAX_VL- 
1:64) of the corresponding ZMM destination register remain unchanged. 

VEX. 128 encoded version: The first source operand is an xmm register encoded by VEX.vvvv. The quadword at bits 
127:64 of the destination operand is copied from the corresponding quadword of the first source operand. Bits 
(MAX_VL-1:128) of the destination register are zeroed. 

EVEX.128 encoded version: The first source operand is an xmm register encoded by EVEX.vvvv. The quadword 
element of the destination operand at bits 127:64 are copied from the first source operand. Bits (MAX_VL-1:128) 
of the destination register are zeroed. 

EVEX version: The low quadword element of the destination is updated according to the writemask. 

Software should ensure VDIVSD is encoded with VEX.L=0. Encoding VDIVSD with VEX.L=1 may encounter unpre¬ 
dictable behavior across different processor generations. 
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Operation 

VDIVSD (EVEX encoded version) 

IF (EVEX.b = 1) AND SRC2 *is a register* 

THEN 

SET_RM(EVEX.RC); 

ELSE 

SET_RM(MXCSR.RM); 

FI; 

IF k1 [0] or *no writemask* 

THEN DEST[63:0] ^ SRC1 [63:0] / SRC2[63:0] 

ELSE 

IF *merglng-masking* ; merging-masking 

THEN *DEST[63:0] remains unchanged* 

ELSE ; zeroing-masking 

THEN DEST[63:0] ^ 0 
FI; 

FI; 

DEST[127:64] ^ SRC1 [127:64] 

DEST[MAX_VL-1:128]^0 

VDIVSD (VEX.128 encoded version) 

DEST[63:0] ^SRCI [63:0] / SRC2[63:0] 

DEST[127:64] ^SRCI [127:64] 

DEST[MAX_VL-1:128] ^0 

DIVSD (128-bit Legacy SSE version) 

DEST[63:0] ^DEST[63:0] / SRC[63:0] 

DEST[MAX_VL-1:64] (Unmodified) 

Intel C/C++ Compiler Intrinsic Equivalent 

VDIVSD_ml 28d _mm_mask_div_sd(_ml 28d s,_mmask8 k,_ml 28d a,_ml 28d b); 

VDIVSD_ml 28d _mm_maskz_div_sd(_mmask8 k,_ml 28d a,_ml 28d b); 

VDIVSD_ml 28d _mm_div_round_sd(_ml 28d a,_ml 28d b, int); 

VDIVSD_ml 28d _mm_mask_div_round_sd(_ml 28d s,_mmask8 k,_ml 28d a,_ml 28d b, int); 

VDIVSD_ml 28d _mm_maskz_div_round_sd(_mmask8 k,_ml 28d a,_ml 28d b, int); 

DIVSD _m128d _mm_div_sd (_m128d a, _m128d b); 

SIMD Floating-Point Exceptions 

Overflow, Underflow, Invalid, Divide-by-Zero, Precision, Denormal 

Other Exceptions 

VEX-encoded instructions, see Exceptions Type 3. 

EVEX-encoded instructions, see Exceptions Type E3. 
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DIVSS—Divide Scalar Single-Precision Floating-Point Values 


Opcode/ 

Instruction 

Op/ 

En 

64/32 
bit Mode 
Support 

CPUiO 

Feature 

Fiag 

Description 

F3 OF 5E /r 

DIVSS xmmi, xmm2/m32 

RM 

V/V 

SSE 

Divide low single-precision floating-point value in 
xmmi by low single-precision floating-point value in 
xmm2/m32. 

VEX.NDS.128.F3.0F.WIG5E/r 

VDIVSS xmmi, xmm2, xmm3/m32 

RVM 

v/v 

AVX 

Divide low single-precision floating-point value in 
xmm2 by low single-precision floating-point value in 
xmm3/m32. 

EVEX.NDS.LIG.F3.0F.W0 5E /r 

VDIVSS xmmi [k1 }[z}, xmm2, 
xmm3/m32[er} 

T1S 

V/V 

AVX512F 

Divide low single-precision floating-point value in 
xmm2 by low single-precision floating-point value in 
xmm3/m32. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RM 

ModRM:reg (r, w) 

ModRM:r/m (r) 

NA 

NA 

RVM 

ModRM:reg (w) 

VEX.vvvv 

ModRM:r/m (r) 

NA 

T1S 

ModRM:reg (w) 

EVEX.vvvv 

ModRM:r/m (r) 

NA 


Description 

Divides the low single-precision floating-point value in the first source operand by the low single-precision floating¬ 
point value in the second source operand, and stores the single-precision floating-point result in the destination 
operand. The second source operand can be an XMM register or a 32-bit memory location. 

128-bit Legacy SSE version: The first source operand and the destination operand are the same. Bits (MAX_VL- 
1:32) of the corresponding VMM destination register remain unchanged. 

VEX. 128 encoded version: The first source operand is an xmm register encoded by VEX.vvvv. The three high-order 
doublewords of the destination operand are copied from the first source operand. Bits (MAX_VL-1:128) of the 
destination register are zeroed. 

EVEX.128 encoded version: The first source operand is an xmm register encoded by EVEX.vvvv. The doubleword 
elements of the destination operand at bits 127:32 are copied from the first source operand. Bits (MAX_VL-1:128) 
of the destination register are zeroed. 

EVEX version: The low doubleword element of the destination is updated according to the writemask. 

Software should ensure VDIVSS is encoded with VEX.L=0. Encoding VDIVSS with VEX.L=1 may encounter unpre¬ 
dictable behavior across different processor generations. 
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Operation 

VDIVSS (EVEX encoded version) 

IF (EVEX.b = 1) AND SRC2 *is a register* 

THEN 

SET_RM(EVEX.RC); 

ELSE 

SET_RM(MXCSR.RM); 

FI; 

IF k1 [0] or *no writemask* 

THEN DEST[31:0] ^ SRC1 [31:0] / SRC2[31:0] 

ELSE 

IF *merglng-masking* ; merging-masking 

THEN *DEST[31:0] remains unchanged* 

ELSE ; zeroing-masking 

THEN DEST[31:0]^0 
FI; 

FI; 

DEST[127:32] ^SRCI [127:32] 

DEST[MAX_VL-1:128]^0 

VDIVSS (VEX.128 encoded version) 

DEST[31:0] ^SRCI [31:0] / SRC2[31:0] 

DEST[127:32] ^SRCI [127:32] 

DEST[MAX_VL-1:128] ^0 

DIVSS (128-bit Legacy SSE version) 

DEST[31:0] ^DEST[31:0] / SRC[31:0] 

DEST[MAX_VL-1:32] (Unmodified) 

Intel C/C++ Compiler Intrinsic Equivalent 

VDIVSS_ml 28 _mm_mask_div_ss(_ml 28 s,_mmask8 k,_ml 28 a,_ml 28 b); 

VDIVSS_ml 28 _mm_maskz_div_ss(_mmask8 k,_ml 28 a,_ml 28 b); 

VDIVSS_ml 28 _mm_div_round_ss(_ml 28 a,_ml 28 b, int); 

VDIVSS_ml 28 _mm_mask_div_round_ss(_ml 28 s,_mmask8 k,_ml 28 a,_ml 28 b, int); 

VDIVSS_ml 28 _mm_maskz_div_round_ss(_mmask8 k,_ml 28 a,_ml 28 b, int); 

DIVSS _m128 _mm_div_ss(_m128 a, _m128 b); 

SIMD Floating-Point Exceptions 

Overflow, Underflow, Invalid, Divide-by-Zero, Precision, Denormal 

Other Exceptions 

VEX-encoded instructions, see Exceptions Type 3. 

EVEX-encoded instructions, see Exceptions Type E3. 
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DPPD — Dot Product of Packed Double Precision Floating-Point Values 


Opcode/ 

Instruction 

Op/ 

En 

64/32-bit 

Mode 

CPUID 

Feature 

Fiag 

Description 

66 OF 3A41 /rib 

DPPD xmm 1, xmm2/m 128, immS 

RMI 

V/V 

SSE4_1 

Selectively multiply packed DP floating-point 
values from xmml with packed DP floating¬ 
point values from xmm2, add and selectively 
store the packed DP floating-point values to 
xmmh 

VEX.NDS.128.66.0F3A.WIG41 /rib 

VDPPD xmmi ,xmm2, xmm3/m128, imm8 

RVMI 

V/V 

AVX 

Selectively multiply packed DP floating-point 
values from xmm2 with packed DP floating¬ 
point values from xmm3, add and selectively 
store the packed DP floating-point values to 
xmmi. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RMI 

ModRM:reg (r, w) 

ModRM:r/m (r) 

imm8 

NA 

RVMI 

ModRM:reg (w) 

VEX.vvvv (r) 

ModRM:r/m (r) 

imm8 


Description 

Conditionally multiplies the packed double-precision floating-point values in the destination operand (first operand) 
with the packed double-precision floating-point values in the source (second operand) depending on a mask 
extracted from bits [5:4] of the immediate operand (third operand). If a condition mask bit is zero, the corre¬ 
sponding multiplication is replaced by a value of 0.0 in the manner described by Section 12.8.4 of I ntel® 64 and 
IA-32 Architectures Software Developer's Manual, Volume 1. 

The two resulting double-precision values are summed into an intermediate result. The intermediate result is 
conditionally broadcasted to the destination using a broadcast mask specified by bits [1:0] of the immediate byte. 

If a broadcast mask bit is "1", the intermediate result is copied to the corresponding qword element in the destina¬ 
tion operand. If a broadcast mask bit is zero, the corresponding element in the destination is set to zero. 

DPPD follows the NaN forwarding rules stated in the Software Developer's Manual, vol. 1, table 4.7. These rules do 
not cover horizontal prioritization of NaNs. Horizontal propagation of NaNs to the destination and the positioning of 
those NaNs in the destination is implementation dependent. NaNs on the input sources or computationally gener¬ 
ated NaNs will have at least one NaN propagated to the destination. 

128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti¬ 
nation is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding 
VMM register destination are unmodified. 

VEX.128 encoded version: the first source operand is an XMM register or 128-bit memory location. The destination 
operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding VMM register destination are 
zeroed. 

If VDPPD is encoded with VEX.L= 1, an attempt to execute the instruction encoded with VEX.L= 1 will cause an 
#UD exception. 
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Operation 

DP_priniitive (SRC1, SRCZ) 

IF (imm8[4] = 1) 

THEN Tempi [63:0] ^ DEST[63:0] * SRC[63:0]; // update SIMD exception flags 
ELSE Tempi [63:0] ^ +0.0; FI; 

IF(imm8[5] = 1) 

THEN Tempi [127:64] ^ DEST[127:64] * SRC[127:64]; // update SIMD exception flags 
ELSE Tempi [127:64] ^ +0.0; FI; 

/* if unmasked exception reported, execute exception handler*/ 

Temp2[63:0] <- Tempi [63:0] + Tempi [127:64]; // update SIMD exception flags 
/* If unmasked exception reported, execute exception handler*/ 

IF(imm8[0] = 1) 

THEN DEST[63:0] ^ Temp2[63:0]; 

ELSE DEST[63:0] ^ +0.0; FI; 

IF(imm8[1] = 1) 

THEN DEST[127:64] ^ Temp2[63:0]; 

ELSE DEST[127:64] ^ +0.0; FI; 

DPPD (128-bit Legacy SSE version) 

DEST[127:0]^DP_Primitive(SRC1 [127:0], SRC2[127:0]); 

DEST[VLMAX-1:128] (Unmodified) 

VDPPD (VEX.128 encoded version) 

DEST[127:0]^DP_Primitive(SRC1 [127:0], SRC2[127:0]); 

DEST[VLMAX-1:128]^0 

Flags Affected 

None 

Intel C/C++ Compiler Intrinsic Equivalent 

DPPD: _ml 28d _mm_dp_pd (_ml 28d a,_ml 28d b, const int mask); 

SIMD Floating-Point Exceptions 

Overflow, Underflow, Invalid, Precision, Denormal 

Exceptions are determined separately for each add and multiply operation. Unmasked exceptions will leave the 
destination untouched. 

Other Exceptions 

See Exceptions Type 2; additionally 
#UD IfVEX.L= 1. 
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DPPS — Dot Product of Packed Single Precision Floating-Point Values 


Opcode/ 

Instruction 

Op/ 

En 

64/32-bit 

Mode 

CPUID 

Feature 

Flag 

Description 

66 OF 3A 40 /r ib 

DPPS xmmi, xmm2/m128, immS 

RMI 

V/V 

SSE4_1 

Selectively multiply packed SP floating-point 
values from xmml with packed SP floating¬ 
point values from xmm2, add and selectively 
store the packed SP floating-point values or 
zero values to xmml. 

VEX.NDS.128.66.0F3A.WIG 40 /r ib 

VDPPS xrTim1,xmm2, xmm3/m128, imm8 

RVMI 

V/V 

AVX 

Multiply packed SP floating point values from 
xmml with packed SP floating point values 
from xmm2/mem selectively add and store to 
xmml. 

VEX.NDS.256.66.0F3A.WIG 40 /r ib 

VDPPS ymmi, ymmZ, ymm3/m256, immS 

RVMI 

V/V 

AVX 

Multiply packed single-precision floating-point 
values from ymmZ with packed SP floating 
point values from ymm3/mem, selectively add 
pairs of elements and store to ymmi. 


Instruction Operand 

Encoding 

Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RMI 

ModRM:reg (r, w) 

ModRM:r/m (r) 

imm8 

NA 

RVMI 

ModRM:reg (w) 

VEX.vvvv (r) 

ModRM:r/m (r) 

imm8 


Description 

Conditionally multiplies the packed single precision floating-point values in the destination operand (first operand) 
with the packed single-precision floats in the source (second operand) depending on a mask extracted from the 
high 4 bits of the immediate byte (third operand). If a condition mask bit in Imm8[7:4] is zero, the corresponding 
multiplication is replaced by a value of 0.0 in the manner described by Section 12.8.4 of I ntei® 64 and IA-32 Archi¬ 
tectures Software Developer's Manual, Volume 1. 

The four resulting single-precision values are summed into an intermediate result. The intermediate result is condi¬ 
tionally broadcasted to the destination using a broadcast mask specified by bits [3:0] of the immediate byte. 

If a broadcast mask bit is "1", the intermediate result is copied to the corresponding dword element in the destina¬ 
tion operand. If a broadcast mask bit is zero, the corresponding element in the destination is set to zero. 

DPPS follows the NaN forwarding rules stated in the Software Developer's Manual, vol. 1, table 4.7. These rules do 
not cover horizontal prioritization of NaNs. Horizontal propagation of NaNs to the destination and the positioning of 
those NaNs in the destination is implementation dependent. NaNs on the input sources or computationally gener¬ 
ated NaNs will have at least one NaN propagated to the destination. 

128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti¬ 
nation is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding 
VMM register destination are unmodified. 

VEX.128 encoded version: the first source operand is an XMM register or 128-bit memory location. The destination 
operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding VMM register destination are 
zeroed. 

VEX.256 encoded version: The first source operand is a VMM register. The second source operand can be a VMM 
register or a 256-bit memory location. The destination operand is a VMM register. 
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Operation 

DP_primitive (SRC1, SRCZ) 

IF (imm8[4] = 1) 

THEN Tempi [31:0] ^ DEST[31:0] * SRC[31:0]; // update SIMD exception flags 
ELSE Tempi [31:0] ^ +0.0; FI; 

IF(imm8[5] = 1) 

THEN Tempi [63:32] ^ DEST[63:32] * SRC[63:32]; // update SIMD exception flags 
ELSE Tempi [63:32] ^+0.0; FI; 

IF(imm8[6] = 1) 

THEN Tempi [95:64] ^ DEST[95:64] * SRC[95:64]; // update SIMD exception flags 
ELSE Tempi [95:64] ^+0.0; FI; 

IF(imm8[7] = 1) 

THEN Tempi [127:96] ^ DEST[127:96] * SRC[127:96]; // update SIMD exception flags 
ELSE Tempi [127:96] ^ +0.0; FI; 

Temp2[31:0] <- Tempi [31:0] + Tempi [63:32]; // update SIMD exception flags 
/* If unmasked exception reported, execute exception handler*/ 

Temp3[31:0] <- Tempi [95:64] + Tempi [127:96]; // update SIMD exception flags 
/* if unmasked exception reported, execute exception handler*/ 

Temp4[31:0] <- Temp2[31:0] + Temp3[31:0]; // update SIMD exception flags 
/* If unmasked exception reported, execute exception handler*/ 

IF(imm8[0] = 1) 

THEN DEST[31:0] ^ Temp4[31:0]; 

ELSE DEST[31:0]^+0.0; FI; 

IF(imm8[1] = 1) 

THEN DEST[63:32] ^ Temp4[31:0]; 

ELSE DEST[63:32] ^ +0.0; FI; 

IF(imm8[2] = 1) 

THEN DEST[95:64] ^ Temp4[31:0]; 

ELSE DEST[95:64] ^ +0.0; FI; 

IF(imm8[3] = 1) 

THEN DEST[127:96] ^ Temp4[31:0]; 

ELSE DEST[127:96] ^ +0.0; FI; 

DPPS (128-bit Legacy SSE version) 

DEST[127:0]^DP_Primitive(SRC1 [127:0], SRC2[127:0]); 

DEST[VLMAX-1:128] (Unmodified) 

VDPPS (VEX.128 encoded version) 

DEST[127:0]^DP_Primitive(SRC1 [127:0], SRC2[127:0]); 

DEST[VLMAX-1:128]^0 

VDPPS (VEX.256 encoded version) 

DEST[127:0]^DP_Primitive(SRC1 [127:0], SRC2[127:0]); 

DEST[255:128]^DP_Primitive(SRC1 [255:128], SRC2[255:128]); 

Flags Affected 

None 
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Intel C/C++ Compiler Intrinsic Equivalent 

(\/)DPPS: _ml 28 _mm_dp_ps ( ml 28 a, ml 28 b, const Int mask); 

VDPPS: _m256 _mm256_dp_ps ( m256 a, m256 b, const Int mask); 

SIMD Floating-Point Exceptions 

Overflow, Underflow, Invalid, Precision, Denormal 

Exceptions are determined separately for each add and multiply operation, in the order of their execution. 
Unmasked exceptions will leave the destination operands unchanged. 

Other Exceptions 

See Exceptions Type 2. 
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EMMS—Empty MMX Technology State 


Opcode 

Instruction 

Op/ 

En 

64-Bit 

Mode 

Compat/ 
Leg Mode 

Description 

OF 77 

EMMS 

NP 

Valid 

Valid 

Set the x87 FPU tag word to empty. 


Instruction Operand 

Encoding 

Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

NP 

NA 

NA 

NA 

NA 


Description 

Sets the values of all the tags in the x87 FPU tag word to empty (all Is). This operation marks the x87 FPU data 
registers (which are aliased to the MMX technology registers) as available for use by x87 FPU floating-point instruc¬ 
tions. (See Figure 8-7 in the Intel® 64 and IA-32 Architectures Software Developer's Manual, Volume 1, for the 
format of the x87 FPU tag word.) All other MMX instructions (other than the EMMS instruction) set all the tags in 
x87 FPU tag word to valid (all Os). 

The EMMS instruction must be used to clear the MMX technology state at the end of all MMX technology procedures 
or subroutines and before calling other procedures or subroutines that may execute x87 floating-point instructions. 
If a floating-point instruction loads one of the registers in the x87 FPU data register stack before the x87 FPU tag 
word has been reset by the EMMS instruction, an x87 floating-point register stack overflow can occur that will 
result in an x87 floating-point exception or incorrect result. 

EMMS operation is the same in non-64-bit modes and 64-bit mode. 

Operation 

x87FPUTagWord ^ FFFFH; 

Intel C/C++ Compiler Intrinsic Equivalent 

void _mm_empty() 

Flags Affected 

None 

Protected Mode Exceptions 

#UD If CR0.EM[bit 2] = 1. 

#NM If CR0.TS[bit 3] = 1. 

#MF If there is a pending FPU exception. 

#UD If the LOCK prefix is used. 

Real-Address Mode Exceptions 

Same exceptions as in protected mode. 

\/irtual-8086 Mode Exceptions 

Same exceptions as in protected mode. 

Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

e4-Bit Mode Exceptions 

Same exceptions as in protected mode. 
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ENTER—Make Stack Frame for Procedure Parameters 


Opcode 

Instruction 

Op/ 

En 

64-Bit 

Mode 

Compat/ 
Leg Mode 

Description 

C8 iw 00 

ENTER/mm 7 6,0 

II 

Valid 

Valid 

Create a stack frame for a procedure. 

C8 iw0^ 

ENTER /mm 7 6,1 

II 

Valid 

Valid 

Create a stack frame with a nested pointer for 
a procedure. 

C8 iw ib 

ENTER imm 7 6, imm8 

II 

Valid 

Valid 

Create a stack frame with nested pointers for 
a procedure. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

II 

iw 

imm8 

NA 

NA 


Description 

Creates a stack frame (comprising of space for dynamic storage and 1-32 frame pointer storage) for a procedure. 
The first operand (imml6) specifies the size of the dynamic storage in the stack frame (that is, the number of bytes 
of dynamically allocated on the stack for the procedure). The second operand (immS) gives the lexical nesting level 
(0 to 31) of the procedure. The nesting level (imm8 mod 32) and the OperandSize attribute determine the size in 
bytes of the storage space for frame pointers. 

The nesting level determines the number of frame pointers that are copied into the "display area" of the new stack 
frame from the preceding frame. The default size of the frame pointer is the StackAddrSize attribute, but can be 
overridden using the 66H prefix. Thus, the OperandSize attribute determines the size of each frame pointer that 
will be copied into the stack frame and the data being transferred from SP/ESP/RSP register into the BP/EBP/RBP 
register. 

The ENTER and companion LEAVE instructions are provided to support block structured languages. The ENTER 
instruction (when used) is typically the first instruction in a procedure and is used to set up a new stack frame for 
a procedure. The LEAVE instruction is then used at the end of the procedure (just before the RET instruction) to 
release the stack frame. 

If the nesting level is 0, the processor pushes the frame pointer from the BP/EBP/RBP register onto the stack, 
copies the current stack pointer from the SP/ESP/RSP register into the BP/EBP/RBP register, and loads the 
SP/ESP/RSP register with the current stack-pointer value minus the value in the size operand. For nesting levels of 
1 or greater, the processor pushes additional frame pointers on the stack before adjusting the stack pointer. These 
additional frame pointers provide the called procedure with access points to other nested frames on the stack. See 
"Procedure Calls for Block-Structured Languages" in Chapter 6 of the Intel® 64 and IA-32 Architectures Software 
Developer's Manual, Volume 1, for more information about the actions of the ENTER instruction. 

The ENTER instruction causes a page fault whenever a write using the final value of the stack pointer (within the 
current stack segment) would do so. 

In 64-bit mode, default operation size is 64 bits; 32-bit operation size cannot be encoded. Use of 66FI prefix 
changes frame pointer operand size to 16 bits. 

When the 66FI prefix is used and causing the OperandSize attribute to be less than the StackAddrSize, software is 
responsible for the following: 

• The companion LEAVE instruction must also use the 66FI prefix, 

• The value in the RBP/EBP register prior to executing "66FI ENTER" must be within the same 16KByte region of 
the current stack pointer (RSP/ESP), such that the value of RBP/EBP after "66FI ENTER" remains a valid address 
in the stack. This ensures "66FI LEAVE" can restore 16-bits of data from the stack. 


3-304 Vol. 2A 


ENTER—Make Stack Frame for Procedure Parameters 

















INSTRUCTION SET REFERENCE, A-L 


Operation 

AllocSize Imm16; 

NestIngLevel ^ ImmS MOD 32; 

IF (OperandSIze = 64) 

THEN 

Push(RBP); (* RSP decrements by 8 *) 

FrameTemp ^ RSP; 

ELSE IF OperandSIze = 32 
THEN 

Push(EBP); (* (E)SP decrements by 4 *) 

FrameTemp <- ESP; FI; 

ELSE (* OperandSIze =16*) 

Push(BP); (* RSP or (E)SP decrements by 2 *) 

FrameTemp ^ SP; 

FI; 

IF NestIngLevel = 0 

THEN GOTO CONTINUE; 

FI; 

IF (NestIngLevel > 1) 

THEN FOR I ^ 1 to (NestIngLevel -1) 

DO 

IF (OperandSIze = 64) 

THEN 

RBP ^ RBP - 8; 

Push([RBP]); (* Quadword push *) 

ELSE IF OperandSIze = 32 
THEN 

IF StackSIze = 32 
EBP ^ EBP - 4; 

Push([EBP]); (* Doubleword push *) 

ELSE (* StackSIze =16*) 

BP ^ BP - 4; 

Push([BP]); (* Doubleword push *) 

FI; 

FI; 

ELSE (* OperandSIze =16*) 

IF StackSIze = 32 
THEN 

EBP ^ EBP - 2; 

Push([EBP]); (* Word push *) 

ELSE (* StackSIze =16*) 

BP ^ BP - 2; 

Push([BP]); (* Word push *) 

FI; 

FI; 

OD; 

FI; 

IF (OperandSIze = 64) (* nestinglevel 1 *) 

THEN 

Push(FrameTemp); (* Quadword push and RSP decrements by 8 *) 
ELSE IF OperandSIze = 32 
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THEN 

Push(FrameTemp); FI; (* Doubleword push and (E)SP decrements by 4 *) 

ELSE (* OperandSIze =16*) 

Push(FrameTemp); (* Word push and RSP|ESP|SP decrements by 2 *) 

FI; 

CONTINUE: 

IF 64-Bit Mode (StackSize = 64) 

THEN 

RBP <- FrameTemp; 

RSP <- RSP - AllocSize; 

ELSE IF OperandSIze = 32 
THEN 

EBP <- FrameTemp; 

ESP ^ ESP-AllocSize; FI; 

ELSE (* OperandSIze =16*) 

BP <- FrameTemp[15:1 ]; (* Bits 16 and above of applicable RBP/EBP are unmodified *) 

SP SP- AllocSize; 

FI; 

END; 

Flags Affected 

None. 

Protected Mode Exceptions 

#SS(0) If the new value of the SP or ESP register is outside the stack segment limit. 

#PF(fault-code) If a page fault occurs or if a write using the final value of the stack pointer (within the current 

stack segment) would cause a page fault. 

#UD If the LOCK prefix is used. 

Real-Address Mode Exceptions 

#SS If the new value of the SP or ESP register is outside the stack segment limit. 

#UD If the LOCK prefix is used. 

Virtual-SOSe Mode Exceptions 

#SS(0) If the new value of the SP or ESP register is outside the stack segment limit. 

#PF(fault-code) If a page fault occurs or if a write using the final value of the stack pointer (within the current 

stack segment) would cause a page fault. 

#UD If the LOCK prefix is used. 

Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

e4-Bit Mode Exceptions 

#SS(0) If the stack address is in a non-canonical form. 

#PF(fault-code) If a page fault occurs or if a write using the final value of the stack pointer (within the current 
stack segment) would cause a page fault. 

#UD If the LOCK prefix is used. 
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EXTRACTPS—Extract Packed Floating-Point Values 


Opcode/ 

Instruction 

Op/ 

En 

64/32 
bit Mode 
Support 

CPUID 

Feature 

Flag 

Description 

66 OF 3A 17 /r ib 

EXTRACTPS reg/m32, xmmi, imm8 

RMI 

VV 

SSE4_1 

Extract one single-precision floating-point value 
from xmmi at the offset specified by immS and 
store the result in reg or m32. Zero extend the 
results in 64-bit register if applicable. 

VEX.128.66.0F3A.WIG 17/r ib 
VEXTRACTPS reg/m32, xmmi, imm8 

RMI 

v/v 

AVX 

Extract one single-precision floating-point value 
from xmmi at the offset specified by immS and 
store the result in reg or m32. Zero extend the 
results in 64-bit register if applicable. 

EVEX.128.66.0F3A.WIG 17 /r ib 
VEXTRACTPS reg/m32, xmmi, imm8 

T1S 

v/v 

AVX512F 

Extract one single-precision floating-point value 
from xmmi at the offset specified by imm8 and 
store the result in reg or m32. Zero extend the 
results in 64-bit register if applicable. 



nstruction Operand Encoding 

Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RMI 

ModRM:r/m (w) 

ModRM:reg (r) 

ImmS 

NA 

T1S 

ModRM:r/m (w) 

ModRM:reg (r) 

ImmS 

NA 


Description 

Extracts a single-precision floating-point value from the source operand (second operand) at the 32-bit offset spec¬ 
ified from immS. Immediate bits higher than the most significant offset for the vector length are ignored. 

The extracted single-precision floating-point value is stored in the low 32-bits of the destination operand 

In 64-bit mode, destination register operand has default operand size of 64 bits. The upper 32-bits of the register 
are filled with zero. REX.W is ignored. 

VEX. 128 and EVEX encoded version: When VEX.Wl or EVEX.Wl form is used in 64-bit mode with a general 
purpose register (GPR) as a destination operand, the packed single quantity is zero extended to 64 bits. 

VEX.vvvv/EVEX.vvvv is reserved and must be 1111b otherwise instructions will #UD. 

128-bit Legacy SSE version: When a REX.W prefix is used in 64-bit mode with a general purpose register (GPR) as 
a destination operand, the packed single quantity is zero extended to 64 bits. 

The source register is an XMM register. Imm8[l:0] determine the starting DWORD offset from which to extract the 
32-bit floating-point value. 

If VEXTRACTPS is encoded with VEX.L= 1, an attempt to execute the instruction encoded with VEX.L= 1 will cause 
an #UD exception. 

Operation 

VEXTRACTPS (EVEX and VEX.128 encoded version) 

SRC_OFFSET ^ IMM8[1:0] 

IF (64-Blt Mode and DEST is register) 

DEST[31:0] ^ (SRC[127:0] » (SRC_OFFSET*32)) AND OFFFFFFFFh 
DEST[63:32] ^ 0 
ELSE 

DEST[31:0] ^ (SRC[127:0] » (SRC_OFFSET*32)) AND OFFFFFFFFh 
FI 


EXTRACTPS—Extract Packed Floating-Point Values 


Vol.2A 3-307 




















INSTRUCTION SET REFERENCE, A-L 


EXTRACTPS (128-bit Legacy SSE version) 

SRC_OFFSET^IMM8[1:0] 

IF (64-Bit Mode and DEST is register) 

DEST[31:0] ^(SRC[127:0] » (SRC_OFFSET*32)) AND OFFFFFFFFh 
DEST[63:32] ^0 
ELSE 

DEST[31:0] ^(SRC[127:0] » (SRC_OFFSET*32)) AND OFFFFFFFFh 
FI 

Intel C/C++ Compiler Intrinsic Equivalent 

EXTRACTPS int _mm_extract_ps (_ml 28 a, const int nidx); 

SIMD Floating-Point Exceptions 

None 

Other Exceptions 

VEX-encoded instructions, see Exceptions Type 5; Additionally 
EVEX-encoded instructions, see Exceptions Type E9NF. 

#UD IFVEX.L=0. 

#UD If VEX.vvvv != llllB or EVEX.vvvv != llllB. 
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F2XM1-Compute 2^-1 


Opcode 

Instruction 

64-Bit 

Mode 

Compat/ 

Leg Mode 

Description 

D9 FO 

F2XM1 

Valid 

Valid 

Replace ST(0) with (Z^no) - i). 


Description 

Computes the exponential value of 2 to the power of the source operand minus 1. The source operand is located in 
register ST(0) and the result is also stored in ST(0). The value of the source operand must lie in the range -1.0 to 
+1.0. If the source value is outside this range, the result is undefined. 

The following table shows the results obtained when computing the exponential value of various classes of 
numbers, assuming that neither overflow nor underflow occurs. 


Table 3-16. Results Obtained from F2XM1 


ST(0) SRC 

ST(0) BEST 

- 1.0 to -0 

- 0.5 to - 0 

-0 

-0 

+ 0 

+ 0 

+ 0 to +1.0 

+ 0 to 1.0 


Values other than 2 can be exponentiated using the following formula: 

xv ^ 2(y * loQjX) 


This instruction's operation is the same in non-64-bit modes and 64-bit mode. 

Operation 

ST(0)^(2ST(0)-1); 

FPU Flags Affected 

Cl Set to 0 if stack underflow occurred. 

Set if result was rounded up; cleared otherwise. 

CO, C2, C3 Undefined. 

Floating-Point Exceptions 

#IS stack underflow occurred. 

#IA Source operand is an SNaN value or unsupported format. 

#D Source is a denormal value. 

#U Result is too small for destination format. 

#P Value cannot be represented exactly in destination format. 

Protected Mode Exceptions 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

#UD If the LOCK prefix is used. 

Real-Address Mode Exceptions 

Same exceptions as in protected mode. 

\/irtual-8086 Mode Exceptions 

Same exceptions as in protected mode. 


F2XM1 —Compute 2x-1 
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Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

64-Bit Mode Exceptions 

Same exceptions as in protected mode. 
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FABS—Absolute Value 


Opcode 

Instruction 

64-Bit 

Mode 

Compat/ 

Leg Mode 

Description 

D9E1 

FABS 

Valid 

Valid 

Replace ST with its absolute value. 


Description 

Clears the sign bit of ST(0) to create the absolute value of the operand. The following table shows the results 
obtained when creating the absolute value of various classes of numbers. 


Table 3-17. Results Obtained from FABS 


ST(0) SRC 

ST(0) DEST 

— OO 

+ OO 

-F 

+ F 

-0 

+ 0 

-tO 

+ 0 

-i-F 

+ F 

+ OO 

OO 

NaN 

NaN 


NOTES: 


F Means finite floating-point value. 

This instruction's operation is the same in non-64-bit modes and 64-bit mode. 

Operation 

ST(0)^|ST(0)|; 

FPU Flags Affected 

Cl Set to 0. 

CO, C2, C3 Undefined. 

Floating-Point Exceptions 

#IS stack underflow occurred. 

Protected Mode Exceptions 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

#UD If the LOCK prefix is used. 

Real-Address Mode Exceptions 

Same exceptions as in protected mode. 

Virtual-8086 Mode Exceptions 

Same exceptions as in protected mode. 

Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

64-Bit Mode Exceptions 

Same exceptions as in protected mode. 


FABS—Absolute Value 
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FADD/FADDP/FIADD-Add 


Opcode 

Instruction 

64-Bit 

Mode 

Compat/ 

Leg Mode 

Description 

D8 /O 

FADD m32fp 

Valid 

Valid 

Add m32fp to ST(0) and store result in ST(0). 

DC /O 

FADD m64fp 

Valid 

Valid 

Add m64fp to ST(0) and store result in ST(0). 

D8 CO+I 

FADD ST(0), ST(I) 

Valid 

Valid 

Add ST(0) to ST(i) and store result in ST(0). 

DC CO+I 

FADD ST(I), ST(0) 

Valid 

Valid 

Add ST(i) to ST(0) and store result in ST(i). 

DE CO+I 

FADDP ST(i), ST(0) 

Valid 

Valid 

Add ST(0) to ST(i), store result in ST(i), and pop the 
register stack. 

DEC1 

FADDP 

Valid 

Valid 

Add ST(0) to ST(1), store result in ST(1), and pop the 
register stack. 

DA /O 

FIADD m32int 

Valid 

Valid 

Add m32int to ST(0) and store result in ST(0). 

DE/0 

FIADD ml Bint 

Valid 

Valid 

Add ml Bint to ST(0) and store result in ST(0). 


Description 

Adds the destination and source operands and stores the sum in the destination location. The destination operand 
is always an FPU register; the source operand can be a register or a memory location. Source operands in memory 
can be in single-precision or double-precision floating-point format or in word or doubleword integer format. 

The no-operand version of the instruction adds the contents of the ST(0) register to the ST(1) register. The one- 
operand version adds the contents of a memory location (either a floating-point or an integer value) to the contents 
of the ST(0) register. The two-operand version, adds the contents of the ST(0) register to the ST(i) register or vice 
versa. The value in ST(0) can be doubled by coding: 

FADD ST(0), ST(0); 

The FADDP instructions perform the additional operation of popping the FPU register stack after storing the result. 
To pop the register stack, the processor marks the ST(0) register as empty and increments the stack pointer (TOP) 
by 1. (The no-operand version of the floating-point add instructions always results in the register stack being 
popped. In some assemblers, the mnemonic for this instruction is FADD rather than FADDP.) 

The FIADD instructions convert an integer source operand to double extended-precision floating-point format 
before performing the addition. 

The table on the following page shows the results obtained when adding various classes of numbers, assuming that 
neither overflow nor underflow occurs. 

When the sum of two operands with opposite signs is 0, the result is +0, except for the round toward mode, in 

which case the result is -0. When the source operand is an integer 0, it is treated as a +0. 

When both operand are infinities of the same sign, the result is »= of the expected sign. If both operands are infini¬ 
ties of opposite signs, an invalid-operation exception is generated. See Table 3-18. 
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Table 3-18. FADD/FADDP/FIADD Results 


DEST 



— oo 

-F 

-0 

+ 0 

+ F 

+ oo 

NaN 

— oo 

— oo 

— oo 

— oo 

— oo 

— oo 

★ 

NaN 

- F or -1 

— oo 

-F 

SRC 

SRC 

+ For + 0 

+ oo 

NaN 

-0 

— oo 

DEST 

-0 

+ 0 

DEST 

+ oo 

NaN 

-hO 

— oo 

DEST 

+ 0 

+ 0 

DEST 

+ oo 

NaN 

-H F or-H 1 

— oo 

+ F or + 0 

SRC 

SRC 

+ F 

+ oo 

NaN 

+ oo 

•Ar 

+ oo 

+ oo 

+ oo 

+ oo 

+ oo 

NaN 

NaN 

NaN 

NaN 

NaN 

NaN 

NaN 

NaN 

NaN 


NOTES: 

F Means finite floating-point value. 

I Means integer. 

* Indicates floating-point invalid-arithmetic-operand (#IA) exception. 

This instruction's operation is the same in non-64-bit modes and 64-bit mode. 

Operation 

IF Instruction = FIADD 
THEN 

DEST DEST + ConvertToDoubleExtendedPrecisionFP(SRC); 

ELSE (* Source operand is floating-point value *) 

DEST ^ DEST + SRC; 

FI; 

IF Instruction = FADDP 
THEN 

PopRegisterStack; 

FI; 

FPU Flags Affected 

Cl Set to 0 if stack underflow occurred. 

Set if result was rounded up; cleared otherwise. 

CO, C2, C3 Undefined. 

Floating-Point Exceptions 

#IS stack underflow occurred. 

#IA Operand is an SNaN value or unsupported format. 

Operands are infinities of unlike sign. 

#D Source operand is a denormal value. 

#U Result is too small for destination format. 

#0 Result is too large for destination format. 

#P Value cannot be represented exactly in destination format. 


FADD/FADDP/FIADD-Add 
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Protected Mode Exceptions 


#GP(0) 

#SS(0) 

#NM 

#PF(fault-code) 

#AC(0) 

#UD 


If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 
If the DS, ES, FS, or GS register contains a NULL segment selector. 

If a memory operand effective address is outside the SS segment limit. 

CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

If a page fault occurs. 

If alignment checking is enabled and an unaligned memory reference is made while the 
current privilege level is 3. 

If the LOCK prefix is used. 


Real-Address Mode 

#GP 

#SS 

#NM 

#UD 


Exceptions 

If a memory operand effective address 
If a memory operand effective address 
CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

If the LOCK prefix is used. 


is outside the CS, DS, ES, FS, or GS segment limit, 
is outside the SS segment limit. 


Virtual-SOSe Mode Exceptions 

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

#SS(0) If a memory operand effective address is outside the SS segment limit. 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 


#PF(fault-code) 

#AC(0) 

#UD 


If a page fault occurs. 

If alignment checking is enabled and an unaligned memory reference is made. 
If the LOCK prefix is used. 


Compatibility Mode Exceptions 

Same exceptions as in protected mode. 


64-Bit Mode Exceptions 

#SS(0) If a memory address referencing the SS segment is in a non-canonical form. 

#GP(0) If the memory address is in a non-canonical form. 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

#MF If there is a pending x87 FPU exception. 

#PF(fault-code) If a page fault occurs. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the 

current privilege level is 3. 

#UD If the LOCK prefix is used. 
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FBLD—Load Binary Coded Decimal 


Opcode 

Instruction 

64-Bit 

Mode 

Compat/ 

Leg Mode 

Description 

DF/4 

FBLD mSOdec 

Valid 

Valid 

Convert BCD value to floating-point and push onto the 

FPU stack. 


Description 

Converts the BCD source operand into double extended-precision floating-point format and pushes the value onto 
the FPU stack. The source operand is loaded without rounding errors. The sign of the source operand is preserved, 
including that of -0. 

The packed BCD digits are assumed to be in the range 0 through 9; the instruction does not check for invalid digits 
(AFI through FFI). Attempting to load an invalid encoding produces an undefined result. 

This instruction's operation is the same in non-64-bit modes and 64-bit mode. 

Operation 

TOP^TOP-1; 

ST(0) <- ConvertToDoubleExtendedPrecisionFP(SRC); 

FPU Flags Affected 

Cl Set to 1 if stack overflow occurred; otherwise, set to 0. 

CO, C2, C3 Undefined. 

Floating-Point Exceptions 

#IS stack overflow occurred. 

Protected Mode Exceptions 

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

If the DS, ES, FS, or GS register contains a NULL segment selector. 

#SS(0) If a memory operand effective address is outside the SS segment limit. 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

#PF(fault-code) If a page fault occurs. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the 

current privilege level is 3. 

#UD If the LOCK prefix is used. 

Real-Address Mode Exceptions 

#GP If a memory operand effective address 

#SS If a memory operand effective address 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

#UD If the LOCK prefix is used. 

\/irtual-8086 Mode Exceptions 

#GP(0) If a memory operand effective address 

#SS(0) If a memory operand effective address 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

#PF(fault-code) If a page fault occurs. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made. 

#UD If the LOCK prefix is used. 


is outside the CS, DS, ES, FS, or GS segment limit, 
is outside the SS segment limit. 


is outside the CS, DS, ES, FS, or GS segment limit, 
is outside the SS segment limit. 
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Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

64-Bit Mode Exceptions 

#SS(0) If a memory address referencing the SS segment is in a non-canonical form 


#GP(0) 

#NM 

#MF 

#PF(fault-code) 

#AC(0) 

If the memory address is in a non-canonical form. 

CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

If there is a pending x87 FPU exception. 

If a page fault occurs. 

If alignment checking is enabled and an unaligned memory reference is made while the 
current privilege level is 3. 

#UD 

If the LOCK prefix is used. 
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FBSTP—Store BCD Integer and Pop 


Opcode 

Instruction 

64-Bit 

Mode 

Compat/ 

Leg Mode 

Description 

DF/6 

FBSTP mBObcd 

Valid 

Valid 

Store ST(0) in mBObcd and pop ST(0). 


Description 

Converts the value in the ST(0) register to an 18-digit packed BCD integer, stores the result in the destination 
operand, and pops the register stack. If the source value is a non-integral value, it is rounded to an integer value, 
according to rounding mode specified by the RC field of the FPU control word. To pop the register stack, the 
processor marks the ST(0) register as empty and increments the stack pointer (TOP) by 1. 

The destination operand specifies the address where the first byte destination value is to be stored. The BCD value 
(including its sign bit) requires 10 bytes of space in memory. 

The following table shows the results obtained when storing various classes of numbers in packed BCD format. 


Table 3-19. FBSTP Results 


ST(0) 

BEST 

- oo or Value Too Large for BEST Format 

* 

F<- 1 

-B 

-1 <F<-0 

** 

-0 

-0 

-tO 

+ 0 

-h0<F<-h1 

** 

F>-h1 

+ D 

-H oo or Value Too Large for BEST Format 

★ 

NaN 

* 


NOTES: 


F Means finite floating-point value. 

D Means packed-BCD number. 

* Indicates floating-point invalid-operation (#IA) exception. 

** +0 or +1, depending on the rounding mode. 

If the converted value is too large for the destination format, or if the source operand is an <>=, SNaN, QNAN, or is in 
an unsupported format, an invalid-arithmetic-operand condition is signaled. If the invalid-operation exception is 
not masked, an invalid-arithmetic-operand exception (#IA) is generated and no value is stored in the destination 
operand. If the invalid-operation exception is masked, the packed BCD indefinite value is stored in memory. 

This instruction's operation is the same in non-64-bit modes and 64-bit mode. 

Operation 

BEST ^ BCD(ST(0)); 

PopRegisterStack; 

FPU Flags Affected 

Cl Set to 0 if stack underflow occurred. 

Set if result was rounded up; cleared otherwise. 

CO, C2, C3 Undefined. 
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Floating-Point Exceptions 

#IS stack underflow occurred. 

#IA Converted value that exceeds 18 BCD digits in length. 

Source operand is an SNaN, QNaN, or in an unsupported format. 
#P Value cannot be represented exactly in destination format. 


Protected Mode Exceptions 

#GP(0) If a segment register is being loaded with a segment selector that points to a non-writable 

segment. 

If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 
If the DS, ES, FS, or GS register contains a NULL segment selector. 

#SS(0) If a memory operand effective address is outside the SS segment limit. 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 


#PF(fault-code) If a page fault occurs. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the 

current privilege level is 3. 

#UD If the LOCK prefix is used. 


Real-Address Mode 

#GP 

#SS 

#NM 

#UD 


Exceptions 

If a memory operand effective address 
If a memory operand effective address 
CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

If the LOCK prefix is used. 


is outside the CS, DS, ES, FS, or GS segment limit, 
is outside the SS segment limit. 


Virtual-SOSe Mode Exceptions 

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

#SS(0) If a memory operand effective address is outside the SS segment limit. 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 


#PF(fault-code) 

#AC(0) 

#UD 


If a page fault occurs. 

If alignment checking is enabled and an unaligned memory reference is made. 
If the LOCK prefix is used. 


Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

e4-Bit Mode Exceptions 

#SS(0) If a memory address referencing the SS segment is in a non-canonical form. 

#GP(0) If the memory address is in a non-canonical form. 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

#MF If there is a pending x87 FPU exception. 

#PF(fault-code) If a page fault occurs. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the 

current privilege level is 3. 

#UD If the LOCK prefix is used. 
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FCHS—Change Sign 


Opcode 

Instruction 

64-Bit 

Mode 

Compat/ 

Leg Mode 

Description 

D9 EO 

FCHS 

Valid 

Valid 

Complements sign of ST(0). 


Description 

Complements the sign bit of ST(0). This operation changes a positive value into a negative value of equal magni¬ 
tude or vice versa. The following table shows the results obtained when changing the sign of various classes of 
numbers. 


Table 3-20 

FCHS Results 

ST(0) SRC 

ST(0) DEST 

— oo 

+ OO 

-F 

-hF 

-0 

-hO 

-1-0 

-0 

-i-F 

-F 

-|- oo 

— OO 

NaN 

NaN 


NOTES: 


* F means finite floating-point value. 

This instruction's operation is the same in non-64-bit modes and 64-bit mode. 

Operation 

SignBit(ST(0)) ^ NOT (SignBit(ST(0))); 

FPU Flags Affected 

Cl Set to 0. 

CO, C2, C3 Undefined. 

Floating-Point Exceptions 

#IS stack underflow occurred. 

Protected Mode Exceptions 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

#UD If the LOCK prefix is used. 

Real-Address Mode Exceptions 

Same exceptions as in protected mode. 

\/irtual-8086 Mode Exceptions 

Same exceptions as in protected mode. 

Compatibility Mode Exceptions 

Same exceptions as in protected mode. 
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64-Bit Mode Exceptions 

Same exceptions as in protected mode. 
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FCLEX/FNCLEX-Clear Exceptions 


Opcode* 

Instruction 

64-Bit 

Mode 

Compat/ 

Leg Mode 

Description 

9B DB E2 

FCLEX 

Valid 

Valid 

Clear floating-point exception flags after checking for 
pending unmasked floating-point exceptions. 

DB E2 

FNCLEX* 

Valid 

Valid 

Clear floating-point exception flags without checking for 
pending unmasked floating-point exceptions. 


NOTES: 

* See IA-32 Architecture Compatibility section below. 


Description 

Clears the floating-point exception flags (PE, UE, OE, ZE, DE, and IE), the exception summary status flag (ES), the 
stack fault flag (SF^ and the busy flag (B) in the FPU status word. The FCLEX instruction checks for and handles 
any pending unmasked floating-point exceptions before clearing the exception flags; the FNCLEX instruction does 
not. 

The assembler issues two instructions for the FCLEX instruction (an FWAIT instruction followed by an FNCLEX 
instruction), and the processor executes each of these instructions separately. If an exception is generated for 
either of these instructions, the save EIP points to the instruction that caused the exception. 

IA-32 Architecture Compatibility 

When operating a Pentium or Intel486 processor in MS-DOS* compatibility mode, it is possible (under unusual 
circumstances) for an FNCLEX instruction to be interrupted prior to being executed to handle a pending FPU excep¬ 
tion. See the section titled "No-Wait FPU Instructions Can Get FPU Interrupt in Window" in Appendix D of the I ntel® 
64 and IA-32 Architectures Software Developer's Manual, Volume 1, fora description of these circumstances. An 
FNCLEX instruction cannot be interrupted in this way on later Intel processors, except for the Intel Quark™ XIOOO 
processor. 

This instruction affects only the x87 FPU floating-point exception flags. It does not affect the SIMD floating-point 
exception flags in the MXCRS register. 

This instruction's operation is the same in non-64-bit modes and 64-bit mode. 

Operation 

FPUStatusWord[0:7] ^ 0; 

FPUStatusWord[15] ^ 0; 

FPU Flags Affected 

The PE, UE, OE, ZE, DE, IE, ES, SF, and B flags in the FPU status word are cleared. The CO, Cl, C2, and C3 flags are 
undefined. 

Floating-Point Exceptions 

None 

Protected Mode Exceptions 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

#UD If the LOCK prefix is used. 

Real-Address Mode Exceptions 

Same exceptions as in protected mode. 
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Virtual-SOSe Mode Exceptions 

Same exceptions as in protected mode. 

Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

e4-Bit Mode Exceptions 

Same exceptions as in protected mode. 
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FCMOVcc—Floating-Point Conditional Move 


Opcode* 

Instruction 

64-Bit 

Mode 

Compat/ 
Leg Mode* 

Description 

DA CO+i 

FCMOVB ST(0), ST(i) 

Valid 

Valid 

Move if below (CF=1). 

DA C8+i 

FCMOVE ST(0), ST(i) 

Valid 

Valid 

Move if equal (ZF=1). 

DA DO+i 

FCMOVBE ST(0), ST(i) 

Valid 

Valid 

Move if below or equal (CF=1 or ZF=1). 

DA D8+i 

FCMOVU ST(0), ST(i) 

Valid 

Valid 

Move if unordered (PF= 1). 

DB CO+i 

FCMOVNB ST(0), ST(i) 

Valid 

Valid 

Move if not below (CF=0). 

DB C8+i 

FCMOVNE ST(0), ST(i) 

Valid 

Valid 

Move if not equal (ZF=0). 

DB DO+i 

FCMOVNBE ST(0), ST(i) 

Valid 

Valid 

Move if not below or equal (CF=0 and ZF=0). 

DB D8+i 

FCMOVNU ST(0), ST(i) 

Valid 

Valid 

Move if not unordered (PF=0). 


NOTES: 

* See IA-32 Architecture Compatibility section below. 


Description 

Tests the status flags in the EFLAGS register and moves the source operand (second operand) to the destination 
operand (first operand) if the given test condition is true. The condition for each mnemonic os given in the Descrip¬ 
tion column above and in Chapter 8 in the Intel® 64 and IA-32 Architectures Software Developer's Manual, Volume 
1. The source operand is always in the ST(i) register and the destination operand is always ST(0). 

The FCMOVcc instructions are useful for optimizing small IF constructions. They also help eliminate branching 
overhead for IF operations and the possibility of branch mispredictions by the processor. 

A processor may not support the FCMOVcc instructions. Software can check if the FCMOVcc instructions are 
supported by checking the processor's feature information with the CPUID instruction (see "COMISS—Compare 
Scalar Ordered Single-Precision Floating-Point Values and Set EFLAGS" in this chapter). If both the CMOV and FPU 
feature bits are set, the FCMOVcc instructions are supported. 

This instruction's operation is the same in non-64-bit modes and 64-bit mode. 

IA-32 Architecture Compatibility 

The FCMOVcc instructions were introduced to the IA-32 Architecture in the P6 family processors and are not avail¬ 
able in earlier IA-32 processors. 

Operation 

IF condition TRUE 
THEN ST(0) ^ ST(i); 

FI; 

FPU Flags Affected 

Cl Set to 0 if stack underflow occurred. 

CO, C2, C3 Undefined. 

Floating-Point Exceptions 

#IS stack underflow occurred. 

Integer Flags Affected 

None. 
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Protected Mode Exceptions 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

#UD If the LOCK prefix is used. 

Real-Address Mode Exceptions 

Same exceptions as in protected mode. 

Virtual-SOSe Mode Exceptions 

Same exceptions as in protected mode. 

Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

e4-Bit Mode Exceptions 

Same exceptions as in protected mode. 
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FCOM/FCOMP/FCOMPP—Compare Floating Point Values 


Opcode 

Instruction 

64-Bit 

Mode 

Compat/ 

Leg Mode 

Description 

D8 /2 

FCOM m32fp 

Valid 

Valid 

Compare ST(0) with m32fp. 

DC /2 

FCOM m64fp 

Valid 

Valid 

Compare ST(0) with m64fp. 

D8 DO+i 

FCOM ST(i) 

Valid 

Valid 

Compare ST(0) with ST(i). 

D8D1 

FCOM 

Valid 

Valid 

Compare ST(0) with ST(1). 

D8 /3 

FCOMP m32fp 

Valid 

Valid 

Compare ST(0) with m32fp and pop register stack. 

DC /3 

FCOMP m64fp 

Valid 

Valid 

Compare ST(0) with m64fp and pop register stack. 

D8 D8+i 

FCOMP ST(i) 

Valid 

Valid 

Compare ST(0) with ST(i) and pop register stack. 

D8 D9 

FCOMP 

Valid 

Valid 

Compare ST(0) with ST(1) and pop register stack. 

DE D9 

FCOMPP 

Valid 

Valid 

Compare ST(0) with ST(1) and pop register stack 
twice. 


Description 

Compares the contents of register ST(0) and source value and sets condition code flags CO, C2, and C3 in the FPU 
status word according to the results (see the table below). The source operand can be a data register or a memory 
location. If no source operand is given, the value in ST(0) is compared with the value in ST(1). The sign of zero is 
ignored, so that -0.0 is equal to+0.0. 


Table 3-21. FCOM/FCOMP/FCOMPP Results 


Condition 

C3 

C2 

CO 

ST(0) > SRC 

0 

0 

0 

ST(0) < SRC 

0 

0 

1 

ST(0) = SRC 

1 

0 

0 

Unordered* 

1 

1 

1 


NOTES: 


* Flags not set If unmasked Invalld-arlthmetlc-operand (#IA) exception is generated. 

This instruction checks the class of the numbers being compared (see "FXAM—Examine Floating-Point" in this 
chapter). If either operand is a NaN or is in an unsupported format, an invalid-arithmetic-operand exception (#IA) 
is raised and, if the exception is masked, the condition flags are set to "unordered." If the invalid-arithmetic- 
operand exception is unmasked, the condition code flags are not set. 

The FCOMP instruction pops the register stack following the comparison operation and the FCOMPP instruction 
pops the register stack twice following the comparison operation. To pop the register stack, the processor marks 
the ST(0) register as empty and increments the stack pointer (TOP) by 1. 

The FCOM instructions perform the same operation as the FUCOM instructions. The only difference is how they 
handle QNaN operands. The FCOM instructions raise an invalid-arithmetic-operand exception (#IA) when either or 
both of the operands is a NaN value or is in an unsupported format. The FUCOM instructions perform the same 
operation as the FCOM instructions, except that they do not generate an invalid-arithmetic-operand exception for 
QNaNs. 

This instruction's operation is the same in non-64-bit modes and 64-bit mode. 
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Operation 

CASE (relation of operands) OF 


ST > SRC: 

C3, C2, CO 4^ 

-000; 

ST < SRC: 

C3, C2, CO 4 

-001; 

ST = SRC: 

C3, C2, CO 4 

- 100; 


ESAC; 

IF ST(0) or SRC = NaN or unsupported format 
THEN 
#IA 

IF FPUControlWord.lM = 1 
THEN 

C3, C2,C0^ 111; 

FI; 

FI; 

IF Instruction = FCOMP 
THEN 

PopRegisterStack; 

FI; 

IF Instruction = FCOMPP 
THEN 

PopRegisterStack; 

PopRegisterStack; 

FI; 

FPU Flags Affected 

Cl Set to 0. 

CO, C2, C3 See table on previous page. 

Floating-Point Exceptions 

#IS stack underflow occurred. 

#IA One or both operands are NaN values or have unsupported formats. 

Register is marked empty. 

#D One or both operands are denormal values. 

Protected Mode Exceptions 

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

If the DS, ES, FS, or GS register contains a NULL segment selector. 

#SS(0) If a memory operand effective address is outside the SS segment limit. 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

#PF(fault-code) If a page fault occurs. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the 

current privilege level is 3. 

#UD If the LOCK prefix is used. 

Real-Address Mode Exceptions 

#GP If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

#SS If a memory operand effective address is outside the SS segment limit. 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

#UD If the LOCK prefix is used. 
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\/irtual-8086 Mode Exceptions 

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 


#SS(0) 

#NM 

#PF(fault-code) 

#AC(0) 

#UD 

If a memory operand effective address is outside the SS segment limit. 

CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

If a page fault occurs. 

If alignment checking is enabled and an unaligned memory reference is made. 

If the LOCK prefix is used. 


Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

64-Bit Mode Exceptions 

#SS(0) If a memory address referencing the SS segment is in a non-canonical form 


#GP(0) 

#NM 

#MF 

#PF(fault-code) 

#AC(0) 

If the memory address is in a non-canonical form. 

CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

If there is a pending x87 FPU exception. 

If a page fault occurs. 

If alignment checking is enabled and an unaligned memory reference is made while the 
current privilege level is 3. 

#UD 

If the LOCK prefix is used. 
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FCOMI/FCOMIP/FUCOMI/FUCOMIP-Compare Floating Point Values and Set EFLAGS 


Opcode 

Instruction 

64-Bit 

Mode 

Compat/ 
Leg Mode 

Description 

DB FO+i 

FCOMI ST, ST(i) 

Valid 

Valid 

Compare ST(0) with ST(i) and set status flags accordingly. 

DF FO+i 

FCOMIP ST, ST(i) 

Valid 

Valid 

Compare ST(0) with ST(i), set status flags accordingly, and 
pop register stack. 

DB E8+i 

FUCOMI ST, ST(i) 

Valid 

Valid 

Compare ST(0) with ST(i), check for ordered values, and set 
status flags accordingly. 

DF E8+i 

FUCOMIP ST, ST(i) 

Valid 

Valid 

Compare ST(0) with ST(i), check for ordered values, set 
status flags accordingly, and pop register stack. 


Description 

Performs an unordered comparison of the contents of registers ST(0) and ST(i) and sets the status flags ZF, PF, and 
CF in the EFLAGS register according to the results (see the table below). The sign of zero is ignored for compari¬ 
sons, so that -0.0 is equal to +0.0. 


Table 3-22. FCOMI/FCOMIP/ FUCOMI/FUCOMIP Results 


Comparison Results* 

ZF 

PF 

CF 

STO > ST(i) 

0 

0 

0 

STO < ST(i) 

0 

0 

1 

STO = ST(i) 

1 

0 

0 

Unordered** 

1 

1 

1 


NOTES: 

* See the IA-32 Architecture Compatibility section below. 

** Flags not set If unmasked Invalld-arithmetic-operand (#IA) exception Is generated. 


An unordered comparison checks the class of the numbers being compared (see "FXAM—Examine Floating-Point" 
in this chapter). The FUCOMI/FUCOMIP instructions perform the same operations as the FCOMI/FCOMIP instruc¬ 
tions. The only difference is that the FUCOMI/FUCOMIP instructions raise the invalid-arithmetic-operand exception 
(#IA) only when either or both operands are an SNaN or are in an unsupported format; QNaNs cause the condition 
code flags to be set to unordered, but do not cause an exception to be generated. The FCOMI/FCOMIP instructions 
raise an invalid-operation exception when either or both of the operands are a NaN value of any kind or are in an 
unsupported format. 

If the operation results in an invalid-arithmetic-operand exception being raised, the status flags in the EFLAGS 
register are set only if the exception is masked. 

The FCOMI/FCOMIP and FUCOMI/FUCOMIP instructions set the OF, SF and AF flags to zero in the EFLAGS register 
(regardless of whether an invalid-operation exception is detected). 

The FCOMIP and FUCOMIP instructions also pop the register stack following the comparison operation. To pop the 
register stack, the processor marks the ST(0) register as empty and increments the stack pointer (TOP) by 1. 

This instruction's operation is the same in non-64-bit modes and 64-bit mode. 

IA-32 Architecture Compatibility 

The FCOMI/FCOMIP/FUCOMI/FUCOMIP instructions were introduced to the IA-32 Architecture in the P6 family 
processors and are not available in earlier IA-32 processors. 
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Operation 

CASE (relation of operands) OF 

ST(0) > ST(i): ZF, PF, CF ^ 000; 

ST(0)<ST(i): ZF, PF, CF ^ 001; 

ST(0) = ST(i): ZF, PF, CF ^ 100; 

ESAC; 

IF Instruction Is FCOMI or FCOMIP 
THEN 

IF ST(0) or ST(I) = NaN or unsupported format 
THEN 
#IA 

IF FPUControlWord.lM = 1 
THEN 

ZF, PF, CF^ 111; 

FI; 

FI; 

FI; 

IF Instruction is FUCOMI or FUCOMIP 
THEN 

IF ST(0) or ST(i) = QNaN, but not SNaN or unsupported format 
THEN 

ZF, PF, CF^ 111; 

ELSE (* ST(0) or ST(i) is SNaN or unsupported format *) 
#IA; 

IF FPUControlWord.lM = 1 
THEN 

ZF, PF, CF^ 111; 

FI; 

FI; 

FI; 

IF Instruction is FCOMIP or FUCOMIP 
THEN 

PopRegisterStack; 

FI; 


FPU Flags Affected 

Cl Set to 0. 

CO, C2, C3 Not affected. 

Floating-Point Exceptions 

#IS stack underflow occurred. 

#IA (FCOMI or FCOMIP instruction) One or both operands are NaN values or have unsupported 

formats. 

(FUCOMI or FUCOMIP instruction) One or both operands are SNaN values (but not QNaNs) or 
have undefined formats. Detection of a QNaN value does not raise an invalid-operand excep¬ 
tion. 
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Protected Mode Exceptions 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

#MF If there is a pending x87 FPU exception. 

#UD If the LOCK prefix is used. 

Real-Address Mode Exceptions 

Same exceptions as in protected mode. 

Virtual-SOSe Mode Exceptions 

Same exceptions as in protected mode. 

Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

e4-Bit Mode Exceptions 

Same exceptions as in protected mode. 
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FCOS— Cosine 


Opcode 

Instruction 

64-Bit 

Mode 

Compat/ 

Leg Mode 

Description 

D9 FF 

FCOS 

Valid 

Valid 

Replace ST(0) with its approximate cosine. 


Description 

Computes the approximate cosine of the source operand in register ST(0) and stores the result in ST(0). The 
source operand must be given in radians and must be within the range -2®^ to +2®^. The following table shows the 
results obtained when taking the cosine of various classes of numbers. 


Table 3-23. FCOS Results 


ST(0) SRC 

ST(0) BEST 

— OO 

* 

-F 

-1 to - 1-1 

-0 

-Hi 

-1-0 

-Hi 

+ F 

- 1 to -H 1 

+ exp 

★ 

NaN 

NaN 


NOTES: 


F Means finite floating-point value. 

* Indicates floating-point inualid-arithmetic-operand (#IA) exception. 

If the source operand is outside the acceptable range, the C2 flag in the FPU status word is set, and the value in 
register ST(0) remains unchanged. The instruction does not raise an exception when the source operand is out of 
range. It is up to the program to check the C2 flag for out-of-range conditions. Source values outside the range - 
2®^ to +2^^ can be reduced to the range of the instruction by subtracting an appropriate integer multiple of 2n. 
Flowever, even within the range -2®^ to -F2®^, inaccurate results can occur because the finite approximation of n 
used internally for argument reduction is not sufficient in all cases. Therefore, for accurate results it is safe to apply 
FCOS only to arguments reduced accurately in software, to a value smaller in absolute value than 37i/8. See the 
sections titled "Approximation of Pi" and "Transcendental Instruction Accuracy" in Chapter 8 of the Intel® 64 and 
IA-32 Architectures Software Developer's Manual, Volume 1, for a discussion of the proper value to use for k in 
performing such reductions. 

This instruction's operation is the same in non-64-bit modes and 64-bit mode. 

Operation 

IF |ST(0)| < 2^^ 

THEN 

C2^0; 

ST(0) ^ FCOS(ST(0)); // approximation of cosine 
ELSE (* Source operand is out-of-range *) 

C2^ 1; 

FI; 
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FPU Flags Affected 

Cl 


Set to 0 if stack underflow occurred. 

Set if result was rounded up; cleared otherwise. 

Undefined if C2 is 1. 

Set to 1 if outside range (-2®^ < source operand < +2®^); otherwise, set to 0. 
Undefined. 


C2 

CO, C3 


Floating-Point Exceptions 


#is 

#IA 

#D 

#P 


Stack underflow occurred. 

Source operand is an SNaN value, or unsupported format. 
Source is a denormal value. 

Value cannot be represented exactly in destination format. 


Protected Mode Exceptions 


#NM 

#MF 

#UD 


CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

If there is a pending x87 FPU exception. 
If the LOCK prefix is used. 


Real-Address Mode Exceptions 

Same exceptions as in protected mode. 

Virtual-SOSe Mode Exceptions 

Same exceptions as in protected mode. 

Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

e4-Bit Mode Exceptions 

Same exceptions as in protected mode. 
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FDECSTP—Decrement Stack-Top Pointer 


Opcode 

Instruction 

64-Bit 

Mode 

Compat/ 

Leg Mode 

Description 

D9 F6 

FDECSTP 

Valid 

Valid 

Decrement TOP field in FPU status word. 


Description 

Subtracts one from the TOP field of the FPU status word (decrements the top-of-stack pointer). If the TOP field 
contains a 0, it is set to 7. The effect of this instruction is to rotate the stack by one position. The contents of the 
FPU data registers and tag register are not affected. 

This instruction's operation is the same in non-64-bit modes and 64-bit mode. 

Operation 

IF TOP = 0 

THEN TOP ^ 7; 

ELSE TOP ^ TOP - 1; 

FI; 

FPU Flags Affected 

The Cl flag is set to 0. The CO, C2, and C3 flags are undefined. 

Floating-Point Exceptions 

None. 

Protected Mode Exceptions 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

#MF If there is a pending x87 FPU exception. 

#UD If the LOCK prefix is used. 

Real-Address Mode Exceptions 

Same exceptions as in protected mode. 

Virtual-SOSe Mode Exceptions 

Same exceptions as in protected mode. 

Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

e4-Bit Mode Exceptions 

Same exceptions as in protected mode. 
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FDIV/FDIVP/FIDIV-Divide 


Opcode 

Instruction 

64-Bit 

Mode 

Compat/ 

Leg Mode 

Description 

D8 /6 

FDIV m32fp 

Valid 

Valid 

Divide ST(0) by m32fp and store result in ST(0). 

DC /6 

FDIV m64fp 

Valid 

Valid 

Divide ST(0) by m64fp and store result in ST(0). 

D8 FO+i 

FDIV ST(0), ST(i) 

Valid 

Valid 

Divide ST(0) by ST(i) and store result in ST(0). 

DC F8+i 

FDIV ST(i), ST(0) 

Valid 

Valid 

Divide ST(i) by ST(0) and store result in ST(i). 

DE F8+i 

FDIVP ST(i), ST(0) 

Valid 

Valid 

Divide ST(i) by ST(0), store result in ST(i), and pop the 
register stack. 

DE F9 

FDIVP 

Valid 

Valid 

Divide ST(1) by ST(0), store result in ST(1), and pop 
the register stack. 

DA /6 

FIDIV m32int 

Valid 

Valid 

Divide ST(0) by m32int and store result in ST(0). 

DE /6 

FIDIVm76/nf 

Valid 

Valid 

Divide ST(0) by ml6int and store result in ST(0). 


Description 

Divides the destination operand by the source operand and stores the result in the destination location. The desti¬ 
nation operand (dividend) is always in an FPU register; the source operand (divisor) can be a register or a memory 
location. Source operands in memory can be in single-precision or double-precision floating-point format, word or 
doubleword integer format. 

The no-operand version of the instruction divides the contents of the ST(1) register by the contents of the ST(0) 
register. The one-operand version divides the contents of the ST(0) register by the contents of a memory location 
(either a floating-point or an integer value). The two-operand version, divides the contents of the ST(0) register by 
the contents of the ST(i) register or vice versa. 

The FDIVP instructions perform the additional operation of popping the FPU register stack after storing the result. 
To pop the register stack, the processor marks the ST(0) register as empty and increments the stack pointer (TOP) 
by 1. The no-operand version of the floating-point divide instructions always results in the register stack being 
popped. In some assemblers, the mnemonic for this instruction is FDIV rather than FDIVP. 

The FIDIV instructions convert an integer source operand to double extended-precision floating-point format 
before performing the division. When the source operand is an integer 0, it is treated as a +0. 

If an unmasked divide-by-zero exception (#Z) is generated, no result is stored; if the exception is masked, an »= of 
the appropriate sign is stored in the destination operand. 

The following table shows the results obtained when dividing various classes of numbers, assuming that neither 
overflow nor underflow occurs. 
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Table 3-24. FDIV/FDIVP/FIDIV Results 


DEST 



— OO 

-F 

-0 

-rO 

+ F 

+ OO 

NaN 

— oo 

★ 

-hO 

-hO 

-0 

-0 

★ 

NaN 

-F 

+ oo 

+ F 

-hO 

-0 

-F 

— OO 

NaN 

-1 

+ oo 

+ F 

-hO 

-0 

-F 

— oo 

NaN 

-0 

+ oo 

★ ★ 

★ 

★ 


— oo 

NaN 

-hO 

— oo 


★ 

★ 


+ oo 

NaN 

+ \ 

— oo 

-F 

-0 

-hO 

+ F 

+ oo 

NaN 

+ F 

— oo 

-F 

-0 

-rO 

+ F 

+ oo 

NaN 

+ OO 

★ 

-0 

-0 

-rO 

+ 0 

•Ar 

NaN 

NaN 

NaN 

NaN 

NaN 

NaN 

NaN 

NaN 

NaN 


NOTES: 

F Means finite floating-point value. 

I Means Integer. 

* Indicates floating-point invalid-arithmetic-operand (#IA) exception. 

** Indicates floating-point zero-divide (#Z) exception. 

This instruction's operation is the same in non-64-bit modes and 64-bit mode. 

Operation 

IF SRC = 0 
THEN 
#Z; 

ELSE 

IF Instruction is FIDIV 
THEN 

DEST DEST / ConvertToDoubleExtendedPrecisionFP(SRC); 

ELSE (* Source operand is floating-point value *) 

DEST ^ DEST / SRC; 

FI; 

FI; 

IF Instruction = FDIVP 
THEN 

PopRegisterStack; 

FI; 

FPU Flags Affected 

Cl Set to 0 if stack underflow occurred. 

Set if result was rounded up; cleared otherwise. 

CO, C2, C3 Undefined. 


FDIV/FDIVP/FIDIV-Divide 
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Floating-Point Exceptions 


#is 

#IA 

#D 

#Z 

#U 

#0 

#P 


Stack underflow occurred. 

Operand is an SNaN value or unsupported format. 

ioo / iO / iO 

Source is a denormal value. 

DEST / +0, where DEST is not equal to +0. 

Result is too small for destination format. 

Result is too large for destination format. 

Value cannot be represented exactly in destination format. 


Protected Mode Exceptions 


#GP(0) 

#SS(0) 

#NM 

#PF(fault-code) 

#AC(0) 

#UD 


If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 
If the DS, ES, FS, or GS register contains a NULL segment selector. 

If a memory operand effective address is outside the SS segment limit. 

CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

If a page fault occurs. 

If alignment checking is enabled and an unaligned memory reference is made while the 
current privilege level is 3. 

If the LOCK prefix is used. 


Real-Address Mode 

#GP 

#SS 

#NM 

#UD 


Exceptions 

If a memory operand effective address 
If a memory operand effective address 
CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

If the LOCK prefix is used. 


is outside the CS, DS, ES, FS, or GS segment limit, 
is outside the SS segment limit. 


Virtual-SOSe Mode Exceptions 

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

#SS(0) If a memory operand effective address is outside the SS segment limit. 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 


#PF(fault-code) 

#AC(0) 

#UD 


If a page fault occurs. 

If alignment checking is enabled and an unaligned memory reference is made. 
If the LOCK prefix is used. 


Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

64-Bit Mode Exceptions 

#SS(0) If a memory address referencing the SS segment is in a non-canonical form. 

#GP(0) If the memory address is in a non-canonical form. 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

#MF If there is a pending x87 FPU exception. 

#PF(fault-code) If a page fault occurs. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the 

current privilege level is 3. 

#UD If the LOCK prefix is used. 
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FDIVR/FDIVRP/FIDIVR-Reverse Divide 


Opcode 

Instruction 

64-Bit 

Mode 

Compat/ 

Leg Mode 

Description 

D8 /7 

FDIVR mSZfp 

Valid 

Valid 

Divide m3Zfp by ST(0) and store result in ST(0). 

DC/7 

FDIVR m64fp 

Valid 

Valid 

Divide m64fp by ST(0) and store result in ST(0). 

D8 F8+i 

FDIVR ST(0), ST(i) 

Valid 

Valid 

Divide ST(i) by ST(0) and store result in ST(0). 

DC FO+i 

FDIVR ST(i), ST(0) 

Valid 

Valid 

Divide ST(0) by ST(i) and store result in ST(i). 

DE FO+i 

FDIVRP ST(i), ST(0) 

Valid 

Valid 

Divide ST(0) by ST(i), store result in ST(i), and pop the 
register stack. 

DEF1 

FDIVRP 

Valid 

Valid 

Divide ST(0) by ST(1), store result in ST(1), and pop the 
register stack. 

DA n 

FIDIVR m32int 

Valid 

Valid 

Divide m3Zint by ST(0) and store result in ST(0). 

DE /7 

FIDIVR m 76/nf 

Valid 

Valid 

Divide mIBintby ST(0) and store result in ST(0). 


Description 

Divides the source operand by the destination operand and stores the result in the destination location. The desti¬ 
nation operand (divisor) is always in an FPU register; the source operand (dividend) can be a register or a memory 
location. Source operands in memory can be in single-precision or double-precision floating-point format, word or 
doubleword integer format. 

These instructions perform the reverse operations of the FDIV, FDIVP, and FIDIV instructions. They are provided to 
support more efficient coding. 

The no-operand version of the instruction divides the contents of the ST(0) register by the contents of the ST(1) 
register. The one-operand version divides the contents of a memory location (either a floating-point or an integer 
value) by the contents of the ST(0) register. The two-operand version, divides the contents of the ST(i) register by 
the contents of the ST(0) register or vice versa. 

The FDIVRP instructions perform the additional operation of popping the FPU register stack after storing the result. 
To pop the register stack, the processor marks the ST(0) register as empty and increments the stack pointer (TOP) 
by 1. The no-operand version of the floating-point divide instructions always results in the register stack being 
popped. In some assemblers, the mnemonic for this instruction is FDIVR rather than FDIVRP. 

The FIDIVR instructions convert an integer source operand to double extended-precision floating-point format 
before performing the division. 

If an unmasked divide-by-zero exception (#Z) is generated, no result is stored; if the exception is masked, an of 
the appropriate sign is stored in the destination operand. 

The following table shows the results obtained when dividing various classes of numbers, assuming that neither 
overflow nor underflow occurs. 
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Table 3-25. FDIVR/FDIVRP/FIDIVR Results 


DEST 



— OO 

-F 

-0 

-hO 

-hF 

+ OO 

NaN 

— oo 

★ 

+ OO 

+ OO 

— OO 

— OO 

k 

NaN 

-F 

-hO 

-hF 

** 

kk 

-F 

-0 

NaN 

-1 

-hO 

-hF 

** 

kk 

-F 

-0 

NaN 

-0 

-hO 

-hO 

* 

k 

-0 

-0 

NaN 

-(-0 

-0 

-0 

■k 

k 

-hO 

-hO 

NaN 

-Hi 

-0 

-F 

•kk 

kk 

-hF 

-hO 

NaN 

-hF 

-0 

-F 

kk 

kk 

-hF 

-hO 

NaN 

+ OO 

* 

— OO 

— OO 

+ OO 

+ OO 

* 

NaN 

NaN 

NaN 

NaN 

NaN 

NaN 

NaN 

NaN 

NaN 


NOTES: 

F Means finite floating-point value. 

I Means integer. 

* Indicates floating-point invalid-arithmetic-operand (#IA) exception. 

** Indicates floating-point zero-divide (#Z) exception. 

When the source operand is an integer 0, it is treated as a -i-O. This instruction's operation is the same in non-64-bit 
modes and 64-bit mode. 

Operation 

IF DEST =0 
THEN 
#Z; 

ELSE 

IF Instruction = FIDIVR 
THEN 

DEST <- ConvertToDoubleExtendedPrecisionFP(SRC) / DEST; 

ELSE (* Source operand is floating-point value *) 

DEST ^ SRC / DEST; 

FI; 

FI; 

IF Instruction = FDIVRP 
THEN 

PopRegisterStack; 

FI; 

FPU Flags Affected 

Cl Set to 0 if stack underflow occurred. 

Set if result was rounded up; cleared otherwise. 

CO, C2, C3 Undefined. 
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Floating-Point Exceptions 

#IS stack underflow occurred. 

#IA Operand is an SNaN value or unsupported format. 

ioo / iO / iO 


#D Source is a denormal value. 

#Z SRC / +0, where SRC is not equal to +0. 

#U Result is too small for destination format. 

#0 Result is too large for destination format. 

#P Value cannot be represented exactly in destination format. 


Protected Mode Exceptions 

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

If the DS, ES, FS, or GS register contains a NULL segment selector. 

#SS(0) If a memory operand effective address is outside the SS segment limit. 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 


#PF(fault-code) If a page fault occurs. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the 

current privilege level is 3. 

#UD If the LOCK prefix is used. 


Real-Address Mode Exceptions 

#GP If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

#SS If a memory operand effective address is outside the SS segment limit. 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

#UD If the LOCK prefix is used. 

Virtual-SOSe Mode Exceptions 

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

#SS(0) If a memory operand effective address is outside the SS segment limit. 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 


#PF(fault-code) 

#AC(0) 

#UD 


If a page fault occurs. 

If alignment checking is enabled and an unaligned memory reference is made. 
If the LOCK prefix is used. 


Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

64-Bit Mode Exceptions 

#SS(0) If a memory address referencing the SS segment is in a non-canonical form. 

#GP(0) If the memory address is in a non-canonical form. 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

#MF If there is a pending x87 FPU exception. 

#PF(fault-code) If a page fault occurs. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the 

current privilege level is 3. 

#UD If the LOCK prefix is used. 
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FFREE—Free Floating-Point Register 


Opcode 

Instruction 

64-Bit 

Mode 

Compat/ 

Leg Mode 

Description 

DD CO+i 

FFREE ST(i) 

Valid 

Valid 

Sets tag for ST(i) to empty. 


Description 

Sets the tag in the FPU tag register associated with register ST(i) to empty (IIB). The contents of ST(i) and the FPU 
stack-top pointer (TOP) are not affected. 

This instruction's operation is the same in non-64-bit modes and 64-bit mode. 

Operation 

TAG(i) ^ 11B; 

FPU Flags Affected 

CO, Cl, C2, C3 undefined. 

Floating-Point Exceptions 

None 

Protected Mode Exceptions 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

#MF If there is a pending x87 FPU exception. 

#UD If the LOCK prefix is used. 

Real-Address Mode Exceptions 

Same exceptions as in protected mode. 

Virtual-SOSe Mode Exceptions 

Same exceptions as in protected mode. 

Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

e4-Bit Mode Exceptions 

Same exceptions as in protected mode. 
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FICOM/FICOMP—Compare Integer 


Opcode 

Instruction 

64-Bit 

Mode 

Compat/ 

Leg Mode 

Description 

DE /2 

FICOMm76/nf 

Valid 

Valid 

Compare ST(0) with ml Sint. 

DA /2 

FICOM m32int 

Valid 

Valid 

Compare ST(0) with m32int 

DE /3 

FICOMP m 76/nf 

Valid 

Valid 

Compare ST(0) with m 7 6/nf and pop stack register. 

DA /3 

FICOMP m32int 

Valid 

Valid 

Compare ST(0) with m32intan6 pop stack register. 


Description 

Compares the value in ST(0) with an integer source operand and sets the condition code flags CO, C2, and C3 in 
the FPU status word according to the results (see table below). The integer value is converted to double extended- 
precision floating-point format before the comparison is made. 


Table 3-26. FICOM/FICOMP Results 


Condition 

C3 

C2 

CO 

ST(0) > SRC 

0 

0 

0 

ST(0) < SRC 

0 

0 

1 

ST(0) = SRC 

1 

0 

0 

Unordered 

1 

1 

1 


These instructions perform an "unordered comparison." An unordered comparison also checks the class of the 
numbers being compared (see "FXAM—Examine Floating-Point" in this chapter). If either operand is a NaN or is in 
an undefined format, the condition flags are set to "unordered." 

The sign of zero is ignored, so that -0.0 +0.0. 

The FICOMP instructions pop the register stack following the comparison. To pop the register stack, the processor 
marks the ST(0) register empty and increments the stack pointer (TOP) by 1. 

This instruction's operation is the same in non-64-bit modes and 64-bit mode. 

Operation 

CASE (relation of operands) OF 


ST(0) > SRC: 

C3, C2, CO + 

-000 

ST(0) < SRC: 

C3, C2, CO + 

-001 

ST(0) = SRC: 

C3, C2, CO + 

-100 

Unordered: 

C3, C2, CO + 

-111 


ESAC; 

IF Instruction = FICOMP 
THEN 

PopRegisterStack; 

FI; 

FPU Flags Affected 

Cl Set to 0. 

CO, C2, C3 See table on previous page. 

Floating-Point Exceptions 

#IS stack underflow occurred. 

#IA One or both operands are NaN values or have unsupported formats. 

#D One or both operands are denormal values. 
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Protected Mode Exceptions 


#GP(0) 

#SS(0) 

#NM 

#PF(fault-code) 

#AC(0) 

#UD 


If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 
If the DS, ES, FS, or GS register contains a NULL segment selector. 

If a memory operand effective address is outside the SS segment limit. 

CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

If a page fault occurs. 

If alignment checking is enabled and an unaligned memory reference is made while the 
current privilege level is 3. 

If the LOCK prefix is used. 


Real-Address Mode 

#GP 

#SS 

#NM 

#UD 


Exceptions 

If a memory operand effective address 
If a memory operand effective address 
CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

If the LOCK prefix is used. 


is outside the CS, DS, ES, FS, or GS segment limit, 
is outside the SS segment limit. 


Virtual-SOSe Mode Exceptions 

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

#SS(0) If a memory operand effective address is outside the SS segment limit. 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 


#PF(fault-code) 

#AC(0) 

#UD 


If a page fault occurs. 

If alignment checking is enabled and an unaligned memory reference is made. 
If the LOCK prefix is used. 


Compatibility Mode Exceptions 

Same exceptions as in protected mode. 


64-Bit Mode Exceptions 

#SS(0) If a memory address referencing the SS segment is in a non-canonical form. 

#GP(0) If the memory address is in a non-canonical form. 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

#MF If there is a pending x87 FPU exception. 

#PF(fault-code) If a page fault occurs. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the 

current privilege level is 3. 

#UD If the LOCK prefix is used. 
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FILD—Load Integer 


Opcode 

Instruction 

64-Bit 

Mode 

Compat/ 

Leg Mode 

Description 

DF /O 

FILD ml Bint 

Valid 

Valid 

Push ml Bint onto the FPU register stack. 

DB /O 

FILD mSZint 

Valid 

Valid 

Push m32int onto the FPU register stack. 

DF /5 

FILD m64int 

Valid 

Valid 

Push mB4int onto the FPU register stack. 


Description 

Converts the signed-integer source operand into double extended-precision floating-point format and pushes the 
value onto the FPU register stack. The source operand can be a word, doubleword, or quadword integer. It is loaded 
without rounding errors. The sign of the source operand is preserved. 

This instruction's operation is the same in non-64-bit modes and 64-bit mode. 

Operation 

TOP ^ TOP - 1; 

ST(0) <- ConvertToDoubleExtendedPrecisionFP(SRC); 

FPU Flags Affected 

Cl Set to 1 if stack overflow occurred; set to 0 otherwise. 

CO, C2, C3 Undefined. 

Floating-Point Exceptions 

#IS stack overflow occurred. 


Protected Mode Exceptions 

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

If the DS, ES, FS, or GS register contains a NULL segment selector. 

#SS(0) If a memory operand effective address is outside the SS segment limit. 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

#PF(fault-code) If a page fault occurs. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the 

current privilege level is 3. 

#UD If the LOCK prefix is used. 


Real-Address Mode Exceptions 

#GP If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

#SS If a memory operand effective address is outside the SS segment limit. 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

#UD If the LOCK prefix is used. 

Virtual-SOSe Mode Exceptions 

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

#SS(0) If a memory operand effective address is outside the SS segment limit. 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 


#PF(fault-code) 

#AC(0) 

#UD 


If a page fault occurs. 

If alignment checking is enabled and an unaligned memory reference is made. 
If the LOCK prefix is used. 
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Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

64-Bit Mode Exceptions 

#SS(0) If a memory address referencing the SS segment is in a non-canonical form 


#GP(0) 

#NM 

#MF 

#PF(fault-code) 

#AC(0) 

If the memory address is in a non-canonical form. 

CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

If there is a pending x87 FPU exception. 

If a page fault occurs. 

If alignment checking is enabled and an unaligned memory reference is made while the 
current privilege level is 3. 

#UD 

If the LOCK prefix is used. 
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FINCSTP—Increment Stack-Top Pointer 


Opcode 

Instruction 

64-Bit 

Mode 

Compat/ 

Leg Mode 

Description 

D9 F7 

FINCSTP 

Valid 

Valid 

Increment the TOP field in the FPU status register. 


Description 

Adds one to the TOP field of the FPU status word (increments the top-of-stack pointer). If the TOP field contains a 
7, it is set to 0. The effect of this instruction is to rotate the stack by one position. The contents of the FPU data 
registers and tag register are not affected. This operation is not equivalent to popping the stack, because the tag 
for the previous top-of-stack register is not marked empty. 

This instruction's operation is the same in non-64-bit modes and 64-bit mode. 

Operation 

IF TOP = 7 

THEN TOP ^ 0; 

ELSE TOP ^ TOP + 1; 

FI; 

FPU Flags Affected 

The Cl flag is set to 0. The CO, C2, and C3 flags are undefined. 

Floating-Point Exceptions 

None 

Protected Mode Exceptions 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

#MF If there is a pending x87 FPU exception. 

#UD If the LOCK prefix is used. 

Real-Address Mode Exceptions 

Same exceptions as in protected mode. 

Virtual-SOSe Mode Exceptions 

Same exceptions as in protected mode. 

Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

64-Bit Mode Exceptions 

Same exceptions as in protected mode. 


FINCSTP—Increment Stack-Top Pointer 
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FINIT/FNINIT—Initialize Floating-Point Unit 


Opcode 

Instruction 

64-Bit 

Mode 

Compat/ 

Leg Mode 

Description 

9B DB E3 

FINIT 

Valid 

Valid 

Initialize FPU after checking for pending unmasked 
floating-point exceptions. 

DB E3 

FNINIT 

Valid 

Valid 

Initialize FPU without checking for pending unmasked 
floating-point exceptions. 


NOTES: 

* See IA-32 Architecture Compatibility section below. 


Description 

Sets the FPU control, status, tag, instruction pointer, and data pointer registers to their default states. The FPU 
control word is set to 037FFI (round to nearest, all exceptions masked, 64-bit precision). The status word is cleared 
(no exception flags set, TOP is set to 0). The data registers in the register stack are left unchanged, but they are all 
tagged as empty (IIB). Both the instruction and data pointers are cleared. 

The FINIT instruction checks for and handles any pending unmasked floating-point exceptions before performing 
the initialization; the FNINIT instruction does not. 

The assembler issues two instructions for the FINIT instruction (an FWAIT instruction followed by an FNINIT 
instruction), and the processor executes each of these instructions in separately. If an exception is generated for 
either of these instructions, the save EIP points to the instruction that caused the exception. 

This instruction's operation is the same in non-64-bit modes and 64-bit mode. 

IA-32 Architecture Compatibility 

When operating a Pentium or Intel486 processor in MS-DOS compatibility mode, it is possible (under unusual 
circumstances) for an FNINIT instruction to be interrupted prior to being executed to handle a pending FPU excep¬ 
tion. See the section titled "No-Wait FPU Instructions Can Get FPU Interrupt in Window" in Appendix D of the I ntel® 
64 and IA-32 Architectures Software Developer's Manual, Volume 1, for a description of these circumstances. An 
FNINIT instruction cannot be interrupted in this way on later Intel processors, except for the Intel Quark™ XIOOO 
processor. 

In the Intel387 math coprocessor, the FINIT/FNINIT instruction does not clear the instruction and data pointers. 
This instruction affects only the x87 FPU. It does not affect the XMM and MXCSR registers. 

Operation 

FPUControlWord ^ 037FH; 

FPUStatusWord <- 0; 

FPUTagWord ^ FFFFH; 

FPUDataPointer 0; 

FPUInstructionPointer <- 0; 

FPULastlnstructionOpcode <- 0; 

FPU Flags Affected 

CO, Cl, C2, C3 set to 0. 

Floating-Point Exceptions 

None 
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Protected Mode Exceptions 


#NM 

#MF 

#UD 


CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

If there is a pending x87 FPU exception. 
If the LOCK prefix is used. 


Real-Address Mode Exceptions 

Same exceptions as in protected mode. 

\/irtual-8086 Mode Exceptions 

Same exceptions as in protected mode. 

Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

64-Bit Mode Exceptions 

Same exceptions as in protected mode. 
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FIST/FISTP-Store Integer 


Opcode 

Instruction 

64-Bit 

Mode 

Compat/ 

Leg Mode 

Description 

DF/2 

FISTm76/nf 

Valid 

Valid 

Store ST(0) in ml Bint. 

DB /2 

FIST m32int 

Valid 

Valid 

Store ST(0) in m32int. 

DF/3 

F\SJP ml Bint 

Valid 

Valid 

Store ST(0) in mIBintand pop register stack. 

DB /3 

FISTP m32int 

Valid 

Valid 

Store ST(0) in m32int and pop register stack. 

DF/7 

FISTP m64int 

Valid 

Valid 

Store ST(0) in mB4int and pop register stack. 


Description 

The FIST instruction converts the value in the ST(0) register to a signed integer and stores the result in the desti¬ 
nation operand. Values can be stored in word or doubleword integer format. The destination operand specifies the 
address where the first byte of the destination value is to be stored. 

The FISTP instruction performs the same operation as the FIST instruction and then pops the register stack. To pop 
the register stack, the processor marks the ST(0) register as empty and increments the stack pointer (TOP) by 1. 
The FISTP instruction also stores values in quadword integer format. 

The following table shows the results obtained when storing various classes of numbers in integer format. 


Table 3-27. FIST/FISTP Results 


ST(0) 

DEST 

- oo or Value Too Large for DEST Format 

* 

F<-1 

-1 

-1 <F<-0 


-0 

0 

-hO 

0 

-hO<F<-h1 

** 

F>-h1 

-Hi 

- 1 - oo or Value Too Large for DEST Format 

* 

NaN 

* 

NOTES: 

F Means finite floating-point value. 

1 Means integer. 

* Indicates floating-point invalid-operation (#IA) exception. 

** 0 or +1, depending on the rounding mode. 


If the source value is a non-integral value, it is rounded to an integer value, according to the rounding mode spec¬ 
ified by the RC field of the FPU control word. 

If the converted value is too large for the destination format, or if the source operand is an SNaN, QNAN, or is in 
an unsupported format, an invalid-arithmetic-operand condition is signaled. If the invalid-operation exception is 
not masked, an invalid-arithmetic-operand exception (#IA) is generated and no value is stored in the destination 
operand. If the invalid-operation exception is masked, the integer indefinite value is stored in memory. 

This instruction's operation is the same in non-64-bit modes and 64-bit mode. 
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Operation 

DEST ^ lnteger(ST(0)); 

IF Instruction = FISTP 
THEN 

PopRegisterStack; 

FI; 


FPU Flags Affected 

Cl Set to 0 if stack underflow occurred. 

Indicates rounding direction of if the inexact exception (#P) is generated: 0 <- not roundup; 1 
^ roundup. 

Set to 0 otherwise. 

CO, C2, C3 Undefined. 


Floating-Point Exceptions 

#IS stack underflow occurred. 

#IA Converted value is too large for the destination format. 

Source operand is an SNaN, QNaN, +=o, or unsupported format. 
#P Value cannot be represented exactly in destination format. 


Protected Mode Exceptions 

#GP(0) If the destination is located in a non-writable segment. 

If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 
If the DS, ES, FS, or GS register is used to access memory and it contains a NULL segment 
selector. 

#SS(0) If a memory operand effective address is outside the SS segment limit. 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

#PF(fault-code) If a page fault occurs. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the 

current privilege level is 3. 


#UD 


If the LOCK prefix is used. 


Real-Address Mode 

#GP 

#SS 

#NM 

#UD 


Exceptions 

If a memory operand effective address 
If a memory operand effective address 
CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

If the LOCK prefix is used. 


is outside the CS, DS, ES, FS, or GS segment limit, 
is outside the SS segment limit. 


Virtual-SOSe Mode 

#GP(0) 

#SS(0) 

#NM 

#PF(fault-code) 

#AC(0) 

#UD 


Exceptions 

If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 
If a memory operand effective address is outside the SS segment limit. 

CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

If a page fault occurs. 

If alignment checking is enabled and an unaligned memory reference is made. 

If the LOCK prefix is used. 


Compatibility Mode Exceptions 

Same exceptions as in protected mode. 


FIST/FISTP-Store Integer 
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e4-Bit Mode Exceptions 

#SS(0) If a memory address referencing the SS segment is in a non-canonical form 


#GP(0) 

#NM 

#MF 

#PF(fault-code) 

#AC(0) 

If the memory address is in a non-canonical form. 

CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

If there is a pending x87 FPU exception. 

If a page fault occurs. 

If alignment checking is enabled and an unaligned memory reference is made while the 
current privilege level is 3. 

#UD 

If the LOCK prefix is used. 
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FISTTP—Store Integer with Truncation 


Opcode 

Instruction 

64-Bit Mode 

Compat/ 

Leg Mode 

Description 

DF/1 

FISTTP m 7 6/nf 

Valid 

Valid 

Store ST(0) in ml Bint with truncation. 

DB/1 

FISTTP m32int 

Valid 

Valid 

Store ST(0) in m32int with truncation. 

DD/1 

FISTTP m64int 

Valid 

Valid 

Store ST(0) in m64int with truncation. 


Description 

FISTTP converts the value in ST into a signed integer using truncation (chop) as rounding mode, transfers the 
result to the destination, and pop ST. FISTTP accepts word, short integer, and long integer destinations. 

The following table shows the results obtained when storing various classes of numbers in integer format. 


Table 3-28. FISTTP Results 


ST(0) 

DEST 

- oo or Value Too Large for DEST Format 

■k 

F< - 1 

-1 

-1 <F<-h1 

0 

FS-tl 

-Hi 

-H oo or Value Too Large for DEST Format 

* 

NaN 

* 


NOTES: 


F Means finite floating-point value. 

I Means integer. 

* Indicates floating-point invalid-operation (#IA) exception. 


This instruction's operation is the same in non-64-bit modes and 64-bit mode. 

Operation 

DEST^ ST; 
pop ST; 

Flags Affected 

Cl is cleared; CO, C2, C3 undefined. 


Numeric Exceptions 

Invalid, Stack Invalid (stack underflow). Precision. 


Protected Mode Exceptions 

#GP(0) If the destination is in a nonwritable segment. 

For an illegal memory operand effective address in the CS, DS, ES, FS or GS segments. 
#SS(0) For an illegal address in the SS segment. 

#PF(fault-code) For a page fault. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the 

current privilege level is 3. 

#NM If CR0.EM[bit 2] = 1. 

If CR0.TS[bit 3] = 1. 

#UD If CPUID.01H:ECX.SSE3[bit 0] = 0. 


If the LOCK prefix is used. 


FISTTP—Store Integer with Truncation 
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Real Address Mode Exceptions 

GP(0) If any part of the operand would lie outside of the effective address space from 0 to OFFFFH. 

#NM If CR0.EM[bit 2] = 1. 

If CR0.TS[bit 3] = 1. 

#UD If CPUID.01H:ECX.SSE3[bit 0] = 0. 

If the LOCK prefix is used. 


Virtual 8086 Mode Exceptions 

GP(0) If any part of the operand would lie outside of the effective address space from 0 to OFFFFFI. 

#NM If CR0.EM[bit 2] = 1. 

If CR0.TS[bit 3] = 1. 

#UD If CPUID.01H:ECX.SSE3[bit 0] = 0. 


If the LOCK prefix is used. 

#PF(fault-code) For a page fault. 

#AC(0) For unaligned memory reference if the current privilege is 3. 


Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

64-Bit Mode Exceptions 

#SS(0) If a memory address referencing the SS segment is in a non-canonical form. 

#GP(0) If the memory address is in a non-canonical form. 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

#MF If there is a pending x87 FPU exception. 

#PF(fault-code) If a page fault occurs. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the 

current privilege level is 3. 

If the LOCK prefix is used. 
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FLD—Load Floating Point Value 


Opcode 

Instruction 

64-Bit 

Mode 

Compat/ 

Leg Mode 

Description 

D9 /O 

FLD m32fp 

Valid 

Valid 

Push m32fp onto the FPU register stack. 

DD /O 

FLD m64fp 

Valid 

Valid 

Push m64fp onto the FPU register stack. 

DB /5 

FLD mSOfp 

Valid 

Valid 

Push mSOfp onto the FPU register stack. 

D9 CO+i 

FLD ST(i) 

Valid 

Valid 

Push ST(i) onto the FPU register stack. 


Description 

Pushes the source operand onto the FPU register stack. The source operand can be in single-precision, double¬ 
precision, or double extended-precision floating-point format. If the source operand is in single-precision or 
double-precision floating-point format, it is automatically converted to the double extended-precision floating¬ 
point format before being pushed on the stack. 

The FLD instruction can also push the value in a selected FPU register [ST(i)] onto the stack. Flere, pushing register 
ST(0) duplicates the stack top. 

This instruction's operation is the same in non-64-bit modes and 64-bit mode. 

Operation 

IF SRC Is ST(i) 

THEN 

temp <- ST(i); 

FI; 

TOP ^ TOP - 1; 

IF SRC is memory-operand 
THEN 

ST(0) <- ConvertToDoubleExtendedPrecisionFP(SRC); 

ELSE (* SRC is ST(i) *) 

ST(0) <- temp; 

FI; 

FPU Flags Affected 

Cl Set to 1 if stack overflow occurred; otherwise, set to 0. 

CO, C2, C3 Undefined. 

Floating-Point Exceptions 

#IS stack underflow or overflow occurred. 

#IA Source operand is an SNaN. Does not occur if the source operand is in double extended-preci¬ 

sion floating-point format (FLD mSOfp or FLD ST(i)). 

#D Source operand is a denormal value. Does not occur if the source operand is in double 

extended-precision floating-point format. 


FLD—Load Floating Point Value 
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Protected Mode Exceptions 

#GP(0) If destination is located in a non-writable segment. 

If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 
If the DS, ES, FS, or GS register is used to access memory and it contains a NULL segment 
selector. 

#SS(0) If a memory operand effective address is outside the SS segment limit. 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 


#PF(fault-code) If a page fault occurs. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the 

current privilege level is 3. 

#UD If the LOCK prefix is used. 


Real-Address Mode 

#GP 

#SS 

#NM 

#UD 


Exceptions 

If a memory operand effective address 
If a memory operand effective address 
CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

If the LOCK prefix is used. 


is outside the CS, DS, ES, FS, or GS segment limit, 
is outside the SS segment limit. 


Virtual-SOSe Mode Exceptions 

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

#SS(0) If a memory operand effective address is outside the SS segment limit. 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 


#PF(fault-code) 

#AC(0) 

#UD 


If a page fault occurs. 

If alignment checking is enabled and an unaligned memory reference is made. 
If the LOCK prefix is used. 


Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

e4-Bit Mode Exceptions 

#SS(0) If a memory address referencing the SS segment is in a non-canonical form. 

#GP(0) If the memory address is in a non-canonical form. 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

#MF If there is a pending x87 FPU exception. 

#PF(fault-code) If a page fault occurs. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the 

current privilege level is 3. 

#UD If the LOCK prefix is used. 
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FLDl/FLDL2T/FLDL2E/FLDPI/FLDLG2/FLDLN2/FLDZ-Load Constant 


Opcode* 

Instruction 

64-Bit 

Mode 

Compat/ 

Leg Mode 

Description 

D9 E8 

FLDl 

Valid 

Valid 

Push +1.0 onto the FPU register stack. 

D9 E9 

FLDL2T 

Valid 

Valid 

Push log 2 l 0 onto the FPU register stack. 

D9 EA 

FLDL2E 

Valid 

Valid 

Push log 2 e onto the FPU register stack. 

D9 EB 

FLDPI 

Valid 

Valid 

Push 7t onto the FPU register stack. 

D9 EC 

FLDLG2 

Valid 

Valid 

Push Iog-|o2 onto the FPU register stack. 

D9 ED 

FLDLN2 

Valid 

Valid 

Push loge2 onto the FPU register stack. 

D9 EE 

FLDZ 

Valid 

Valid 

Push +0.0 onto the FPU register stack. 


NOTES: 

* See IA-32 Architecture Compatibility section below. 


Description 

Push one of seven commonly used constants (in double extended-precision floating-point format) onto the FPU 
register stack. The constants that can be loaded with these instructions include +1.0, +0.0, Iog2l0, log 2 e, k, Iogio2, 
and loge2. For each constant, an internal 66-bit constant is rounded (as specified by the RC field in the FPU control 
word) to double extended-precision floating-point format. The inexact-result exception (#P) is not generated as a 
result of the rounding, nor is the Cl flag set in the x87 FPU status word if the value is rounded up. 

See the section titled "Approximation of Pi" in Chapter 8 of the Intel® 64 and IA-32 Architectures Software Devel¬ 
oper's Manual, Volume 1, for a description of the n constant. 

This instruction's operation is the same in non-64-bit modes and 64-bit mode. 

IA-32 Architecture Compatibility 

When the RC field is set to round-to-nearest, the FPU produces the same constants that is produced by the Intel 
8087 and Intel 287 math coprocessors. 

Operation 

TOP ^ TOP - 1; 

ST(0) ^ CONSTANT; 

FPU Flags Affected 

Cl Set to 1 if stack overflow occurred; otherwise, set to 0. 

CO, C2, C3 Undefined. 

Floating-Point Exceptions 

#IS stack overflow occurred. 

Protected Mode Exceptions 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

#MF If there is a pending x87 FPU exception. 

#UD If the LOCK prefix is used. 

Real-Address Mode Exceptions 

Same exceptions as in protected mode. 

\/irtual-8086 Mode Exceptions 

Same exceptions as in protected mode. 


FLDl /FLDL2T/FLDL2E/FLDPI/FLDLC2/FLDLN2/FLDZ-Load Constant 
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Virtual-SOSe Mode Exceptions 

Same exceptions as in protected mode. 

Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

e4-Bit Mode Exceptions 

Same exceptions as in protected mode. 
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FLDCW-Load x87 FPU Control Word 


Opcode 

Instruction 

64-Bit 

Mode 

Compat/ 

Leg Mode 

Description 

D9 /5 

FLDCW mZbyte 

Valid 

Valid 

Load FPU control word from mZbyte. 


Description 

Loads the 16-bit source operand into the FPU control word. The source operand is a memory location. This instruc¬ 
tion is typically used to establish or change the FPU's mode of operation. 

If one or more exception flags are set in the FPU status word prior to loading a new FPU control word and the new 
control word unmasks one or more of those exceptions, a floating-point exception will be generated upon execution 
of the next floating-point instruction (except for the no-wait floating-point instructions, see the section titled "Soft¬ 
ware Exception Flandling" in Chapter 8 of the Intel® 64 and IA-32 Architectures Software Developer's Manual, 
Volume 1). To avoid raising exceptions when changing FPU operating modes, clear any pending exceptions (using 
the FCLEX or FNCLEX instruction) before loading the new control word. 

This instruction's operation is the same in non-64-bit modes and 64-bit mode. 

Operation 

FPUControlWord ^ SRC; 


FPU Flags Affected 

CO, Cl, C2, C3 undefined. 


Floating-Point Exceptions 

None; however, this operation might unmask a pending exception in the FPU status word. That exception is then 
generated upon execution of the next "waiting" floating-point instruction. 


Protected Mode Exceptions 

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

If the DS, ES, FS, or GS register is used to access memory and it contains a NULL segment 
selector. 

#SS(0) If a memory operand effective address is outside the SS segment limit. 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 


#PF(fault-code) 

#AC(0) 

#UD 


If a page fault occurs. 

If alignment checking is enabled and an unaligned memory reference is made while the 
current privilege level is 3. 

If the LOCK prefix is used. 


Real-Address Mode Exceptions 

#GP If a memory operand effective address 

#SS If a memory operand effective address 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

#UD If the LOCK prefix is used. 


is outside the CS, DS, ES, FS, or GS segment limit, 
is outside the SS segment limit. 


FLDCW-Load x87 FPU Control Word 
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Virtual-SOSe Mode Exceptions 

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 


#SS(0) 

#NM 

#PF(fault-code) 

#AC(0) 

#UD 

If a memory operand effective address is outside the SS segment limit. 

CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

If a page fault occurs. 

If alignment checking is enabled and an unaligned memory reference is made. 

If the LOCK prefix is used. 


Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

64-Bit Mode Exceptions 

#SS(0) If a memory address referencing the SS segment is in a non-canonical form 


#GP(0) 

#NM 

#MF 

#PF(fault-code) 

#AC(0) 

If the memory address is in a non-canonical form. 

CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

If there is a pending x87 FPU exception. 

If a page fault occurs. 

If alignment checking is enabled and an unaligned memory reference is made while the 
current privilege level is 3. 

#UD 

If the LOCK prefix is used. 
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FLDENV—Load x87 FPU Environment 


Opcode 

Instruction 

64-Bit 

Mode 

Compat/ 

Leg Mode 

Description 

D9 /4 

FLDENV ml4/28byte 

Valid 

Valid 

Load FPU environment from ml4byteor mZQbyte. 


Description 

Loads the complete x87 FPU operating environment from memory into the FPU registers. The source operand spec¬ 
ifies the first byte of the operating-environment data in memory. This data is typically written to the specified 
memory location by a FSTENV or FNSTENV instruction. 

The FPU operating environment consists of the FPU control word, status word, tag word, instruction pointer, data 
pointer, and last opcode. Figures 8-9 through 8-12 in the Intel® 64 and IA-32 Architectures Software Developer's 
Manual, Volume 1, show the layout in memory of the loaded environment, depending on the operating mode of the 
processor (protected or real) and the current operand-size attribute (16-bit or 32-bit). In virtual-8086 mode, the 
real mode layouts are used. 

The FLDENV instruction should be executed in the same operating mode as the corresponding FSTENV/FNSTENV 
instruction. 

If one or more unmasked exception flags are set in the new FPU status word, a floating-point exception will be 
generated upon execution of the next floating-point instruction (except for the no-wait floating-point instructions, 
see the section titled "Software Exception Flandling" in Chapter 8 of the Intel® 64 and IA-32 Architectures Soft¬ 
ware Developer's Manual, Volume 1). To avoid generating exceptions when loading a new environment, clear all 
the exception flags in the FPU status word that is being loaded. 

If a page or limit fault occurs during the execution of this instruction, the state of the x87 FPU registers as seen by 
the fault handler may be different than the state being loaded from memory. In such situations, the fault handler 
should ignore the status of the x87 FPU registers, handle the fault, and return. The FLDENV instruction will then 
complete the loading of the x87 FPU registers with no resulting context inconsistency. 

This instruction's operation is the same in non-64-bit modes and 64-bit mode. 

Operation 

FPUControlWord ^ SRC[FPUControlWord]; 

FPUStatusWord ^ SRC[FPUStatusWord]; 

FPUTagWord ^ SRC[FPUTagWord]; 

FPUDataPointer <- SRC[FPUDataPointer]; 

FPUInstructionPointer <- SRC[FPUInstructionPointer]; 

FPULastInstructionOpcode SRC[FPULastlnstructionOpcode]; 

FPU Flags Affected 

The CO, Cl, C2, C3 flags are loaded. 

Floating-Point Exceptions 

None; however, if an unmasked exception is loaded in the status word, it is generated upon execution of the next 
"waiting" floating-point instruction. 


FLDENV—Load x87 FPU Environment 


Vol.2A 3-359 








INSTRUCTION SET REFERENCE, A-L 


Protected Mode Exceptions 

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

If the DS, ES, FS, or GS register is used to access memory and it contains a NULL segment 
selector. 

#SS(0) If a memory operand effective address is outside the SS segment limit. 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 


#PF(fault-code) If a page fault occurs. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the 

current privilege level is 3. 

#UD If the LOCK prefix is used. 


Real-Address Mode 

#GP 

#SS 

#NM 

#UD 


Exceptions 

If a memory operand effective address 
If a memory operand effective address 
CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

If the LOCK prefix is used. 


is outside the CS, DS, ES, FS, or GS segment limit, 
is outside the SS segment limit. 


Virtual-SOSe Mode Exceptions 

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

#SS(0) If a memory operand effective address is outside the SS segment limit. 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 


#PF(fault-code) 

#AC(0) 

#UD 


If a page fault occurs. 

If alignment checking is enabled and an unaligned memory reference is made. 
If the LOCK prefix is used. 


Compatibility Mode Exceptions 

Same exceptions as in protected mode. 


64-Bit Mode Exceptions 

#SS(0) If a memory address referencing the SS segment is in a non-canonical form. 

#GP(0) If the memory address is in a non-canonical form. 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

#MF If there is a pending x87 FPU exception. 

#PF(fault-code) If a page fault occurs. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the 

current privilege level is 3. 

#UD If the LOCK prefix is used. 
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FMUL/FMULP/FIMUL-Multiply 


Opcode 

Instruction 

64-Bit 

Mode 

Compat/ 
Leg Mode 

Description 

D8/1 

FMUL m32fp 

Valid 

Valid 

Multiply ST(0) by m32fp and store result in ST(0). 

DC/I 

FMUL m64fp 

Valid 

Valid 

Multiply ST(0) by m64fp and store result in ST(0). 

D8 C8+i 

FMUL ST(0), ST(i) 

Valid 

Valid 

Multiply ST(0) by ST(i) and store result in ST(0). 

DC C8+i 

FMUL ST(i), ST(0) 

Valid 

Valid 

Multiply ST(i) by ST(0) and store result in ST(i). 

DE C8+i 

FMULP ST(i), ST(0) 

Valid 

Valid 

Multiply ST(i) by ST(0), store result in ST(i), and pop the 
register stack. 

DE C9 

FMULP 

Valid 

Valid 

Multiply ST(1) by ST(0), store result in ST(1), and pop 
the register stack. 

DA/I 

FIMUL mSZint 

Valid 

Valid 

Multiply ST(0) by m32intan6 store result in ST(0). 

DE/1 

FIMULm76/nf 

Valid 

Valid 

Multiply ST(0) by m16intan6 store result in ST(0). 


Description 

Multiplies the destination and source operands and stores the product in the destination location. The destination 
operand is always an FPU data register; the source operand can be an FPU data register or a memory location. 
Source operands in memory can be in single-precision or double-precision floating-point format or in word or 
doubleword integer format. 

The no-operand version of the instruction multiplies the contents of the ST(1) register by the contents of the ST(0) 
register and stores the product in the ST(1) register. The one-operand version multiplies the contents of the ST(0) 
register by the contents of a memory location (either a floating point or an integer value) and stores the product in 
the ST(0) register. The two-operand version, multiplies the contents of the ST(0) register by the contents of the 
ST(i) register, or vice versa, with the result being stored in the register specified with the first operand (the desti¬ 
nation operand). 

The FMULP instructions perform the additional operation of popping the FPU register stack after storing the 
product. To pop the register stack, the processor marks the ST(0) register as empty and increments the stack 
pointer (TOP) by 1. The no-operand version of the floating-point multiply instructions always results in the register 
stack being popped. In some assemblers, the mnemonic for this instruction is FMUL rather than FMULP. 

The FIMUL instructions convert an integer source operand to double extended- 
precision floating-point format before performing the multiplication. 

The sign of the result is always the exclusive-OR of the source signs, even if one or more of the values being multi¬ 
plied is 0 or When the source operand is an integer 0, it is treated as a +0. 

The following table shows the results obtained when multiplying various classes of numbers, assuming that neither 
overflow nor underflow occurs. 
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Table 3-29. FMUL/FMULP/FIMUL Results 


DEST 



— OO 

-F 

-0 

-hO 

-hF 

+ OO 

NaN 

— oo 

-|- oo 

+ OO 

* 

★ 

— OO 

— OO 

NaN 

-F 

+ oo 

-hF 

-hO 

-0 

-F 

— oo 

NaN 

-1 

+ oo 

-hF 

-hO 

-0 

-F 

— oo 

NaN 

-0 

★ 

-hO 

-hO 

-0 

-0 

★ 

NaN 

-hO 

* 

-0 

-0 

-hO 

-hO 

* 

NaN 

-Hi 

— oo 

-F 

-0 

-hO 

-hF 

+ oo 

NaN 

-hF 

— oo 

-F 

-0 

-hO 

-hF 

+ oo 

NaN 

+ OO 

— oo 

— OO 

■k 

* 

+ OO 

+ oo 

NaN 

NaN 

NaN 

NaN 

NaN 

NaN 

NaN 

NaN 

NaN 


NOTES: 

F Means finite floating-point value. 

I Means Integer. 

* Indicates Invalid-arlthmetlc-operand (#IA) exception. 


This instruction's operation is the same in non-64-bit modes and 64-bit mode. 

Operation 

IF Instruction = FIMUL 
THEN 

DEST <- DEST * ConvertToDoubleExtendedPrecisionFP(SRC); 

ELSE (* Source operand is floating-point value *) 

DEST ^ DEST * SRC; 

FI; 

IF Instruction = FMULP 
THEN 

PopRegisterStack; 

FI; 


FPU Flags Affected 

Cl Set to 0 if stack underflow occurred. 

Set if result was rounded up; cleared otherwise. 
CO, C2, C3 Undefined. 


Floating-Point Exceptions 


#is 

#IA 

#D 

#U 

#0 

#P 


Stack underflow occurred. 

Operand is an SNaN value or unsupported format. 

One operand is +0 and the other is +<>=. 

Source operand is a denormal value. 

Result is too small for destination format. 

Result is too large for destination format. 

Value cannot be represented exactly in destination format. 
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Protected Mode Exceptions 

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

If the DS, ES, FS, or GS register is used to access memory and it contains a NULL segment 
selector. 

#SS(0) If a memory operand effective address is outside the SS segment limit. 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 


#PF(fault-code) If a page fault occurs. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the 

current privilege level is 3. 

#UD If the LOCK prefix is used. 


Real-Address Mode 

#GP 

#SS 

#NM 

#UD 


Exceptions 

If a memory operand effective address 
If a memory operand effective address 
CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

If the LOCK prefix is used. 


is outside the CS, DS, ES, FS, or GS segment limit, 
is outside the SS segment limit. 


Virtual-SOSe Mode 

#GP(0) 

#SS(0) 

#NM 

#PF(fault-code) 

#AC(0) 

#UD 


Exceptions 

If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 
If a memory operand effective address is outside the SS segment limit. 

CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

If a page fault occurs. 

If alignment checking is enabled and an unaligned memory reference is made. 

If the LOCK prefix is used. 


Compatibility Mode Exceptions 

Same exceptions as in protected mode. 


64-Bit Mode Exceptions 

#SS(0) If a memory address referencing the SS segment is in a non-canonical form. 

#GP(0) If the memory address is in a non-canonical form. 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

#MF If there is a pending x87 FPU exception. 

#PF(fault-code) If a page fault occurs. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the 

current privilege level is 3. 

#UD If the LOCK prefix is used. 
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FNOP—No Operation 


Opcode 

Instruction 

64-Bit 

Mode 

Compat/ 

Leg Mode 

Description 

D9 DO 

FNOP 

Valid 

Valid 

No operation is performed. 


Description 

Performs no FPU operation. This instruction takes up space in the instruction stream but does not affect the FPU or 
machine context, except the EIP register and the FPU Instruction Pointer. 

This instruction's operation is the same in non-64-bit modes and 64-bit mode. 

FPU Flags Affected 

CO, Cl, C2, C3 undefined. 

Floating-Point Exceptions 

None 

Protected Mode Exceptions 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

#MF If there is a pending x87 FPU exception. 

#UD If the LOCK prefix is used. 

Real-Address Mode Exceptions 

Same exceptions as in protected mode. 

Virtual-SOSe Mode Exceptions 

Same exceptions as in protected mode. 

Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

e4-Bit Mode Exceptions 

Same exceptions as in protected mode. 
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FPATAN—Partial Arctangent 


Opcode* 

Instruction 

64-Bit 

Mode 

Compat/ 

Leg Mode 

Description 

D9F3 

FPATAN 

Valid 

Valid 

Replace ST(1) with arctan(ST(1 )/ST(0)) and pop the register stack. 


NOTES: 

* See IA-32 Architecture Compatibility section below. 


Description 

Computes the arctangent of the source operand in register ST(1) divided by the source operand in register ST(0), 
stores the result in ST(1), and pops the FPU register stack. The result in register ST(0) has the same sign as the 
source operand ST(1) and a magnitude less than +k. 

The FPATAN instruction returns the angle between the X axis and the line from the origin to the point (X,Y), where 
Y (the ordinate) is ST(1) and X (the abscissa) is ST(0). The angle depends on the sign of X and Y independently, 
not just on the sign of the ratio Y/X. This is because a point (-X,Y) is in the second quadrant, resulting in an angle 
between n/2 and n, while a point (X,-Y) is in the fourth quadrant, resulting in an angle between 0 and -n/2. A point 
(-X,-Y) is in the third quadrant, giving an angle between -k/2 and -n. 

The following table shows the results obtained when computing the arctangent of various classes of numbers, 
assuming that underflow does not occur. 

Table 3-30. FPATAN Results 


ST(0) 



— OO 

-F 

-0 

+ 0 

+ F 

-|- OO 

NaN 

— oo 

- 371/4* 

-71/2 

-nIZ 

-71/2 

-71/2 

-7c/4* 

NaN 

-F 

-P 

-7C to -7t/2 

-nIZ 

-7c/2 

1 

o 

1 

o 

-0 

NaN 

-0 

-P 

-P 

-P* 

-0* 

-0 

-0 

NaN 

-tO 

+P 

+ P 

-H7t* 

+ 0* 

-tO 

+ 0 

NaN 

+ F 

+P 

+n to +K/Z 

-H 71/2 

+n/Z 

H-7t/2 to t-O 

+ 0 

NaN 

+ OO 

-H37t/4* 

+n/Z 

-H7t/2 

+n/Z 

H- 71/2 

+ n/4* 

NaN 

NaN 

NaN 

NaN 

NaN 

NaN 

NaN 

NaN 

NaN 


NOTES: 

F Means finite floating-point value. 

* Table 8-10 in the Inter 64 and IA-32 Architectures Software Developer's Manual, Volume 1, specifies that the ratios 0/0 and oo/oo 
generate the floating-point invalid arithmetic-operation exception and, if this exception is masked, the floating-point QNaN indefi¬ 
nite value is returned. With the FPATAN instruction, the 0/0 or <=o/oo value Is actually not calculated using division. Instead, the arc¬ 
tangent of the two variables Is derived from a standard mathematical formulation that Is generalized to allow complex numbers as 
arguments. In this complex variable formulation, arctangent(0,0) etc. has well defined values. These values are needed to develop 
a library to compute transcendental functions with complex arguments, based on the FPU functions that only allow floating-point 
values as arguments. 

There is no restriction on the range of source operands that FPATAN can accept. 

This instruction's operation is the same in non-64-bit modes and 64-bit mode. 

IA-32 Architecture Compatibility 

The source operands for this instruction are restricted for the 80287 math coprocessor to the following range: 

0 < |ST(1 )| < |ST(0)| < H-oo 


FPATAN—Partial Arctangent 


Vol.2A 3-365 

























INSTRUCTION SET REFERENCE, A-L 


Operation 

ST(1)^arctan(ST(1)/ST(0)); 

PopRegisterStack; 

FPU Flags Affected 

Cl Set to 0 if stack underflow occurred. 

Set if result was rounded up; cleared otherwise. 

CO, C2, C3 Undefined. 

Floating-Point Exceptions 

#IS stack underflow occurred. 

#IA Source operand is an SNaN value or unsupported format. 

#D Source operand is a denormal value. 

#U Result is too small for destination format. 

#P Value cannot be represented exactly in destination format. 

Protected Mode Exceptions 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

#MF If there is a pending x87 FPU exception. 

#UD If the LOCK prefix is used. 

Real-Address Mode Exceptions 

Same exceptions as in protected mode. 

Virtual-SOSe Mode Exceptions 

Same exceptions as in protected mode. 

Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

64-Bit Mode Exceptions 

Same exceptions as in protected mode. 
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FPREM—Partial Remainder 


Opcode 

Instruction 

64-Bit 

Mode 

Compat/ 

Leg Mode 

Description 

D9 F8 

FPREM 

Valid 

Valid 

Replace ST(0) with the remainder obtained from dividing 
ST(0) by ST(1). 


Description 

Computes the remainder obtained from dividing the value in the ST(0) register (the dividend) by the value in the 
ST(1) register (the divisor or modulus), and stores the result in ST(0). The remainder represents the following 
value: 

Remainder <- ST(0) - (Q * ST(1)) 

Here, Q is an integer value that is obtained by truncating the floating-point number quotient of [ST(0) / ST(1)] 
toward zero. The sign of the remainder is the same as the sign of the dividend. The magnitude of the remainder is 
less than that of the modulus, unless a partial remainder was computed (as described below). 

This instruction produces an exact result; the inexact-result exception does not occur and the rounding control has 
no effect. The following table shows the results obtained when computing the remainder of various classes of 
numbers, assuming that underflow does not occur. 


Table 3-31. FPREM Results 





ST 

[1) 



-OO 

-F 

-0 

-rO 

+F 

+ 00 

NaN 


-oo 

★ 

★ 

* 

k 

k 

k 

NaN 

ST(0) 

-F 

ST(0) 

-F or -0 

** 

kk 

-F or -0 

ST(0) 

NaN 


-0 

-0 

-0 

■k 

k 

-0 

-0 

NaN 


-to 

-rO 

-rO 

•k 

k 

-rO 

-rO 

NaN 


+F 

ST(0) 

-rF or -rO 

** 

kk 

-rF or -rO 

ST(0) 

NaN 


+ 00 

* 

* 

■k 

k 

* 

k 

NaN 


NaN 

NaN 

NaN 

NaN 

NaN 

NaN 

NaN 

NaN 


NOTES: 


F Means finite floating-point value. 

* Indicates floating-point invalid-arithmetic-operand (#IA) exception. 

** Indicates floating-point zero-divide (#Z) exception. 

When the result is 0, its sign is the same as that of the dividend. When the modulus is the result is equal to the 
value in ST(0). 

The FPREM instruction does not compute the remainder specified in IEEE Std 754. The IEEE specified remainder 
can be computed with the FPREM 1 instruction. The FPREM instruction is provided for compatibility with the Intel 
8087 and Intel287 math coprocessors. 

The FPREM instruction gets its name "partial remainder" because of the way it computes the remainder. This 
instruction arrives at a remainder through iterative subtraction. It can, however, reduce the exponent of ST(0) by 
no more than 63 in one execution of the instruction. If the instruction succeeds in producing a remainder that is 
less than the modulus, the operation is complete and the C2 flag in the FPU status word is cleared. Otherwise, C2 
is set, and the result in ST(0) is called the partial remainder. The exponent of the partial remainder will be less 
than the exponent of the original dividend by at least 32. Software can re-execute the instruction (using the partial 
remainder in ST(0) as the dividend) until C2 is cleared. (Note that while executing such a remainder-computation 
loop, a higher-priority interrupting routine that needs the FPU can force a context switch in-between the instruc¬ 
tions in the loop.) 

An important use of the FPREM instruction is to reduce the arguments of periodic functions. When reduction is 
complete, the instruction stores the three least-significant bits of the quotient in the C3, Cl, and CO flags of the FPU 
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status word. This information is important in argument reduction for the tangent function (using a modulus of 7i/4), 
because it locates the original angle in the correct one of eight sectors of the unit circle. 

This instruction's operation is the same in non-64-bit modes and 64-bit mode. 


Operation 

D exponent(ST(0)) - exponent(ST(1)); 


IF D < 64 
THEN 

Q <- lnteger(TruncateTowardZero(ST(0) / ST(1))); 
ST(0)^ST(0)-(ST(1)*Q); 

C2^0; 

CO, C3, Cl ^ LeastSlgnificantBits(Q); (* Q2, Q1, QO *) 

ELSE 

C2^ 1; 

N ^ An Implementation-dependent number between 32 and 63; 
QQ ^ lnteger(TruncateTowardZero((ST(0) / ST(1)) / 2(° - n))); 
ST(0) ^ ST(0) - (ST(1) * QQ 2(D-n)); 


FPU Flags Affected 

CO 

Cl 

C2 

C3 


Set to bit 2 (Q2) of the quotient. 

Set to 0 if stack underflow occurred; otherwise, set to least significant bit of quotient (QO). 
Set to 0 if reduction complete; set to 1 if incomplete. 

Set to bit 1 (Ql) of the quotient. 


Floating-Point Exceptions 

#IS stack underflow occurred. 

#IA Source operand is an SNaN value, modulus is 0, dividend is or unsupported format. 

#D Source operand is a denormal value. 

#U Result is too small for destination format. 


Protected Mode Exceptions 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

#MF If there is a pending x87 FPU exception. 

#UD If the LOCK prefix is used. 

Real-Address Mode Exceptions 

Same exceptions as in protected mode. 

Virtual-SOSe Mode Exceptions 

Same exceptions as in protected mode. 

Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

64-Bit Mode Exceptions 

Same exceptions as in protected mode. 
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FPREMl—Partial Remainder 


Opcode 

Instruction 

64-Bit 

Mode 

Compat/ 

Leg Mode 

Description 

D9F5 

FPREMl 

Valid 

Valid 

Replace ST(0) with the IEEE remainder obtained from dividing 
ST(0) by ST(1). 


Description 

Computes the IEEE remainder obtained from dividing the value in the ST(0) register (the dividend) by the value in 
the ST(1) register (the divisor or modulus), and stores the result in ST(0^ The remainder represents the following 
value: 

Remainder <- ST(0) - (Q * ST(1)) 

Here, Q is an integer value that is obtained by rounding the floating-point number quotient of [ST(0) / ST(1)] 
toward the nearest integer value. The magnitude of the remainder is less than or equal to half the magnitude of the 
modulus, unless a partial remainder was computed (as described below). 

This instruction produces an exact result; the precision (inexact) exception does not occur and the rounding control 
has no effect. The following table shows the results obtained when computing the remainder of various classes of 
numbers, assuming that underflow does not occur. 


Table 3-32. FPREMl Results 



ST 

1) 

ST(0) 


— OO 

-F 

-0 

-hO 

+ F 

+ OO 

NaN 

— OO 

•k 

* 

* 

k 

* 

k 

NaN 

-F 

ST(0) 

+F or -0 

** 

kk 

+ F or - 0 

ST(0) 

NaN 

-0 

-0 

-0 

k 

k 

-0 

-0 

NaN 

-tO 

H-O 

-hO 

k 

k 

-hO 

+0 

NaN 

-i-F 

ST(0) 

+ F or -H 0 

kk 

** 

+ F or -H 0 

ST(0) 

NaN 

+ OO 

■k 

■k 

k 

* 

* 

* 

NaN 

NaN 

NaN 

NaN 

NaN 

NaN 

NaN 

NaN 

NaN 


NOTES: 


F Means finite floating-point value. 

* Indicates floating-point invalid-arithmetic-operand (#IA) exception. 

** Indicates floating-point zero-divide (#Z) exception. 

When the result is 0, its sign is the same as that of the dividend. When the modulus is the result is equal to the 
value in ST(0). 

The FPREMl instruction computes the remainder specified in IEEE Standard 754. This instruction operates differ¬ 
ently from the FPREM instruction in the way that it rounds the quotient of ST(0) divided by ST(1) to an integer (see 
the "Operation" section below). 

Like the FPREM instruction, FPREMl computes the remainder through iterative subtraction, but can reduce the 
exponent of ST(0) by no more than 63 in one execution of the instruction. If the instruction succeeds in producing 
a remainder that is less than one half the modulus, the operation is complete and the C2 flag in the FPU status word 
is cleared. Otherwise, C2 is set, and the result in ST(0) is called the partial remainder. The exponent of the partial 
remainder will be less than the exponent of the original dividend by at least 32. Software can re-execute the 
instruction (using the partial remainder in ST(0) as the dividend) until C2 is cleared. (Note that while executing 
such a remainder-computation loop, a higher-priority interrupting routine that needs the FPU can force a context 
switch in-between the instructions in the loop.) 

An important use of the FPREMl instruction is to reduce the arguments of periodic functions. When reduction is 
complete, the instruction stores the three least-significant bits of the quotient in the C3, Cl, and CO flags of the FPU 
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status word. This information is important in argument reduction for the tangent function (using a modulus of 7i/4), 
because it locates the original angle in the correct one of eight sectors of the unit circle. 

This instruction's operation is the same in non-64-bit modes and 64-bit mode. 


Operation 

D exponent(ST(0)) - exponent(ST(1)); 


IF D < 64 
THEN 

Q <- lnteger(RoundTowardNearestlnteger(ST(0) / ST(1))); 
ST(0)^ST(0)-(ST(1)*Q); 

C2^0; 

CO, C3, Cl ^ LeastSlgnificantBits(Q); (* Q2, Q1, QO *) 

ELSE 

C2^ 1; 

N ^ An Implementation-dependent number between 32 and 63; 
QQ ^ lnteger(TruncateTowardZero((ST(0) / ST(1)) / 2(° - n))); 
ST(0) ^ ST(0) - (ST(1) * QQ 2(D-n)); 


FPU Flags Affected 

CO 

Cl 

C2 

C3 


Set to bit 2 (Q2) of the quotient. 

Set to 0 if stack underflow occurred; otherwise, set to least significant bit of quotient (QO). 
Set to 0 if reduction complete; set to 1 if incomplete. 

Set to bit 1 (Ql) of the quotient. 


Floating-Point Exceptions 

#IS stack underflow occurred. 

#IA Source operand is an SNaN value, modulus (divisor) is 0, dividend is »=, or unsupported 

format. 

#D Source operand is a denormal value. 

#U Result is too small for destination format. 


Protected Mode Exceptions 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

#MF If there is a pending x87 FPU exception. 

#UD If the LOCK prefix is used. 

Real-Address Mode Exceptions 

Same exceptions as in protected mode. 

Virtual-SOSe Mode Exceptions 

Same exceptions as in protected mode. 

Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

e4-Bit Mode Exceptions 

Same exceptions as in protected mode. 


3-370 Vol. 2A 


FPREM1—Partial Remainder 


INSTRUCTION SET REFERENCE, A-L 


FPTAN—Partial Tangent 


Opcode 

Instruction 

64-Bit 

Mode 

Compat/ 

Leg Mode 

Description 

D9 F2 

FPTAN 

Valid 

Valid 

Replace ST(0) with its approximate tangent and push 1 
onto the FPU stack. 


Description 

Computes the approximate tangent of the source operand in register ST(0), stores the result in ST(0), and pushes 
a 1.0 onto the FPU register stack. The source operand must be given in radians and must be less than ±2®^. The 
following table shows the unmasked results obtained when computing the partial tangent of various classes of 


numbers, assuming that underflow does not occur. 

Table 3-33. 

FPTAN Results 

ST(0) SRC 

ST(0) DEST 

— oo 

* 

-F 

- F to -H F 

-0 

-0 

-tO 

-tO 

+ F 

- F to -H F 

+ OO 

* 

NaN 

NaN 


NOTES: 


F Means finite floating-point value. 

* Indicates floating-point invalid-arithmetic-operand (#IA) exception. 

If the source operand is outside the acceptable range, the C2 flag in the FPU status word is set, and the value in 
register ST(0) remains unchanged. The instruction does not raise an exception when the source operand is out of 
range. It is up to the program to check the C2 flag for out-of-range conditions. Source values outside the range - 
2®^ to +2^^ can be reduced to the range of the instruction by subtracting an appropriate integer multiple of 2n. 
Flowever, even within the range -2®^ to +2^^, inaccurate results can occur because the finite approximation of ti 
used internally for argument reduction is not sufficient in all cases. Therefore, for accurate results it is safe to apply 
FPTAN only to arguments reduced accurately in software, to a value smaller in absolute value than 37c/8. See the 
sections titled "Approximation of Pi" and "Transcendental Instruction Accuracy" in Chapter 8 of the Intel® 64 and 
IA-32 Architectures Software Developer's Manual, Volume 1, for a discussion of the proper value to use for n in 
performing such reductions. 

The value 1.0 is pushed onto the register stack after the tangent has been computed to maintain compatibility with 
the Intel 8087 and Intel287 math coprocessors. This operation also simplifies the calculation of other trigonometric 
functions. For instance, the cotangent (which is the reciprocal of the tangent) can be computed by executing a 
FDIVR instruction after the FPTAN instruction. 

This instruction's operation is the same in non-64-bit modes and 64-bit mode. 
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Operation 

IF ST(0) < 2^3 
THEN 

C2^0; 

ST(0) <- fptan(ST(0)); // approximation of tan 
TOP ^ TOP-1; 

ST(0) ^ 1.0; 

ELSE (* Source operand Is out-of-range *) 

C2^ 1; 

FI; 

FPU Flags Affected 

Cl Set to 0 if stack underflow occurred; set to 1 if stack overflow occurred. 

Set if result was rounded up; cleared otherwise. 

C2 Set to 1 if outside range (-2®^ < source operand < -h 2®^); otherwise, set to 0. 

CO, C3 Undefined. 

Floating-Point Exceptions 

#IS stack underflow or overflow occurred. 

#IA Source operand is an SNaN value, or unsupported format. 

#D Source operand is a denormal value. 

#U Result is too small for destination format. 

#P Value cannot be represented exactly in destination format. 

Protected Mode Exceptions 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

#MF If there is a pending x87 FPU exception. 

#UD If the LOCK prefix is used. 

Real-Address Mode Exceptions 

Same exceptions as in protected mode. 

Virtual-SOSe Mode Exceptions 

Same exceptions as in protected mode. 

Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

e4-Bit Mode Exceptions 

Same exceptions as in protected mode. 
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FRNDINT—Round to Integer 


Opcode 

Instruction 

64-Bit 

Mode 

Compat/ 

Leg Mode 

Description 

D9 FC 

FRNDINT 

Valid 

Valid 

Round ST(0) to an Integer. 


Description 

Rounds the source value in the ST(0) register to the nearest integral value, depending on the current rounding 
mode (setting of the RC field of the FPU control word), and stores the result in ST(0). 

If the source value is <>=, the value is not changed. If the source value is not an integral value, the floating-point 
inexact-result exception (#P) is generated. 

This instruction's operation is the same in non-64-bit modes and 64-bit mode. 

Operation 

ST(0) ^ RoundTolntegralValue(ST(0)); 

FPU Flags Affected 

Cl Set to 0 if stack underflow occurred. 

Set if result was rounded up; cleared otherwise. 

CO, C2, C3 Undefined. 

Floating-Point Exceptions 

#IS stack underflow occurred. 

#IA Source operand is an SNaN value or unsupported format. 

#D Source operand is a denormal value. 

#P Source operand is not an integral value. 

Protected Mode Exceptions 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

#MF If there is a pending x87 FPU exception. 

#UD If the LOCK prefix is used. 

Real-Address Mode Exceptions 

Same exceptions as in protected mode. 

Virtual-SOSe Mode Exceptions 

Same exceptions as in protected mode. 

Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

e4-Bit Mode Exceptions 

Same exceptions as in protected mode. 
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FRSTOR—Restore x87 FPU State 


Opcode 

Instruction 

64-Bit 

Mode 

Compat/ 

Leg Mode 

Description 

DD /4 

FRSTOR m94/108bvte 

Valid 

Valid 

Load FPU state from m94b\/te or m 7 OBbyte. 


Description 

Loads the FPU state (operating environment and register stack) from the memory area specified with the source 
operand. This state data is typically written to the specified memory location by a previous FSAVE/FNSAVE instruc¬ 
tion. 

The FPU operating environment consists of the FPU control word, status word, tag word, instruction pointer, data 
pointer, and last opcode. Figures 8-9 through 8-12 in the Intel® 64 and IA-32 Architectures Software Developer's 
Manual, Volume 1, show the layout in memory of the stored environment, depending on the operating mode of the 
processor (protected or real) and the current operand-size attribute (16-bit or 32-bit). In virtual-8086 mode, the 
real mode layouts are used. The contents of the FPU register stack are stored in the 80 bytes immediately following 
the operating environment image. 

The FRSTOR instruction should be executed in the same operating mode as the corresponding FSAVE/FNSAVE 
instruction. 

If one or more unmasked exception bits are set in the new FPU status word, a floating-point exception will be 
generated. To avoid raising exceptions when loading a new operating environment, clear all the exception flags in 
the FPU status word that is being loaded. 

This instruction's operation is the same in non-64-bit modes and 64-bit mode. 

Operation 

FPUControlWord ^ SRC[FPUControlWord]; 

FPUStatusWord ^ SRC[FPUStatusWord]; 

FPUTagWord ^ SRC[FPUTagWord]; 

FPUDataPointer SRC[FPUDataPointer]; 

FPUInstructionPointer <- SRC[FPUInstructionPointer]; 

FPULastlnstructionOpcode <- SRC[FPULastlnstructionOpcode]; 

ST(0) ^ SRC[ST(0)]; 

ST(1)^SRC[ST(1)]; 

ST(2)^SRC[ST(2)]; 

ST(3)^SRC[ST(3)]; 

ST(4) ^ SRC[ST(4)]; 

ST(5)^SRC[ST(5)]; 

ST(6) ^ SRC[ST(6)]; 

ST(7)^SRC[ST(7)]; 

FPU Flags Affected 

The CO, Cl, C2, C3 flags are loaded. 

Floating-Point Exceptions 

None; however, this operation might unmask an existing exception that has been detected but not generated, 
because it was masked. Flere, the exception is generated at the completion of the instruction. 
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Protected Mode Exceptions 

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

If the DS, ES, FS, or GS register is used to access memory and it contains a NULL segment 
selector. 

#SS(0) If a memory operand effective address is outside the SS segment limit. 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 


#PF(fault-code) If a page fault occurs. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the 

current privilege level is 3. 

#UD If the LOCK prefix is used. 


Real-Address Mode Exceptions 

#GP If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

#SS If a memory operand effective address is outside the SS segment limit. 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

#UD If the LOCK prefix is used. 

Virtual-SOSe Mode Exceptions 

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

#SS(0) If a memory operand effective address is outside the SS segment limit. 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 


#PF(fault-code) 

#AC(0) 

#UD 


If a page fault occurs. 

If alignment checking is enabled and an unaligned memory reference is made. 
If the LOCK prefix is used. 


Compatibility Mode Exceptions 

Same exceptions as in protected mode. 


64-Bit Mode Exceptions 

#SS(0) If a memory address referencing the SS segment is in a non-canonical form. 

#GP(0) If the memory address is in a non-canonical form. 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

#PF(fault-code) If a page fault occurs. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the 

current privilege level is 3. 

#UD If the LOCK prefix is used. 


FRSTOR—Restore x87 FPU State 


Vol.2A 3-375 


INSTRUCTION SET REFERENCE, A-L 


FSAVE/FNSAVE-Store x87 FPU State 


Opcode 

Instruction 

64-Bit 

Mode 

Compat/ 

Leg Mode 

Description 

9B DD /6 

FSAVE m94/108byte 

Valid 

Valid 

Store FPU state to m94byte or m 108byte after 
checking for pending unmasked floating-point 
exceptions. Then re-initialize the FPU. 

DD /6 

FNSAVE m94/W8bvte 

Valid 

Valid 

Store FPU environment to m94byte or m 108byte 
without checking for pending unmasked floating¬ 
point exceptions. Then re-initialize the FPU. 


NOTES: 

* See IA-32 Architecture Compatibility section below. 


Description 

Stores the current FPU state (operating environment and register stack) at the specified destination in memory, 
and then re-initializes the FPU. The FSAVE instruction checks for and handles pending unmasked floating-point 
exceptions before storing the FPU state; the FNSAVE instruction does not. 

The FPU operating environment consists of the FPU control word, status word, tag word, instruction pointer, data 
pointer, and last opcode. Figures 8-9 through 8-12 in the Intel® 64 and IA-32 Architectures Software Developer's 
Manual, Volume 1, show the layout in memory of the stored environment, depending on the operating mode of the 
processor (protected or real) and the current operand-size attribute (16-bit or 32-bit). In virtual-8086 mode, the 
real mode layouts are used. The contents of the FPU register stack are stored in the 80 bytes immediately follow 
the operating environment image. 

The saved image reflects the state of the FPU after all floating-point instructions preceding the FSAVE/FNSAVE 
instruction in the instruction stream have been executed. 

After the FPU state has been saved, the FPU is reset to the same default values it is set to with the FINIT/FNINIT 
instructions (see "FINIT/FNINIT—Initialize Floating-Point Unit" in this chapter). 

The FSAVE/FNSAVE instructions are typically used when the operating system needs to perform a context switch, 
an exception handler needs to use the FPU, or an application program needs to pass a "clean" FPU to a procedure. 

The assembler issues two instructions for the FSAVE instruction (an FWAIT instruction followed by an FNSAVE 
instruction), and the processor executes each of these instructions separately. If an exception is generated for 
either of these instructions, the save EIP points to the instruction that caused the exception. 

This instruction's operation is the same in non-64-bit modes and 64-bit mode. 

IA-32 Architecture Compatibility 

For Intel math coprocessors and FPUs prior to the Intel Pentium processor, an FWAIT instruction should be 
executed before attempting to read from the memory image stored with a prior FSAVE/FNSAVE instruction. This 
FWAIT instruction helps ensure that the storage operation has been completed. 

When operating a Pentium or Intel486 processor in MS-DOS compatibility mode, it is possible (under unusual 
circumstances) for an FNSAVE instruction to be interrupted prior to being executed to handle a pending FPU excep¬ 
tion. See the section titled "No-Wait FPU Instructions Can Get FPU Interrupt in Window" in Appendix D of the I ntel® 
64 and IA-32 Architectures Software Developer's Manual, Volume 1, for a description of these circumstances. An 
FNSAVE instruction cannot be interrupted in this way on later Intel processors, except for the Intel Quark™ XIOOO 
processor. 
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Operation 

(* Save FPU State and Registers *) 

DEST[FPUControlWord] ^ FPUControlWord; 

DEST[FPUStatusWord] ^ FPUStatusWord; 

DEST[FPUTagWord] ^ FPUTagWord; 

DEST[FPUDataPolnter] FPUDataPoInter; 

DEST[FPUInstructlonPolnter] <- FPUInstructlonPoInter; 

DEST[FPULastlnstructionOpcode] <- FPULastlnstructlonOpcode; 

DEST[ST(0)] ^ ST(0); 

DEST[ST(1)] ^ ST(1); 

DEST[ST(2)] ^ ST(2); 

DEST[ST(3)] ^ ST(3); 

DEST[ST(4)]^ ST(4); 

DEST[ST(5)] ^ ST(5); 

DEST[ST(6)] ^ ST(6); 

DEST[ST(7)] ^ ST(7); 

(* Initialize FPU *) 

FPUControlWord ^ 037FH; 

FPUStatusWord <- 0; 

FPUTagWord ^ FFFFH; 

FPUDataPoInter 0; 

FPUInstructlonPoInter <- 0; 

FPULastlnstructlonOpcode 0; 

FPU Flags Affected 

The CO, Cl, C2, and C3 flags are saved and then cleared. 

Floating-Point Exceptions 

None. 

Protected Mode Exceptions 

#GP(0) If destination is located in a non-writable segment. 

If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

If the DS, ES, FS, or GS register is used to access memory and it contains a NULL segment 
selector. 

#SS(0) If a memory operand effective address is outside the SS segment limit. 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

#PF(fault-code) If a page fault occurs. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the 

current privilege level is 3. 

#UD If the LOCK prefix is used. 

Real-Address Mode Exceptions 

#GP If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

#SS If a memory operand effective address is outside the SS segment limit. 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

#UD If the LOCK prefix is used. 
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Virtual-SOSe Mode Exceptions 

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

#SS(0) If a memory operand effective address is outside the SS segment limit. 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

#PF(fault-code) If a page fault occurs. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made. 

#UD If the LOCK prefix is used. 


Compatibility Mode Exceptions 

Same exceptions as in protected mode. 


64-Bit Mode Exceptions 


#SS(0) 

#GP(0) 

#NM 

#MF 

#PF(fault-code) 


If a memory address referencing the SS segment is in a non-canonical form. 
If the memory address is in a non-canonical form. 

CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

If there is a pending x87 FPU exception. 

If a page fault occurs. 


#AC(0) 


If alignment checking is enabled and an unaligned memory reference is made while the 
current privilege level is 3. 
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FSCALE-Scale 


Opcode 

Instruction 

64-Bit 

Mode 

Compat/ 

Leg Mode 

Description 

D9 FD 

FSCALE 

Valid 

Valid 

Scale ST(0) by ST(1). 


Description 

Truncates the value in the source operand (toward 0) to an integral value and adds that value to the exponent of 
the destination operand. The destination and source operands are floating-point values located in registers ST(0) 
and ST(1), respectively. This instruction provides rapid multiplication or division by integral powers of 2. The 
following table shows the results obtained when scaling various classes of numbers, assuming that neither over¬ 
flow nor underflow occurs. 


Table 3-34. FSCALE Results 


ST(1) 



— OO 

-F 

-0 

-1-0 

-i-F 

+ OO 

NaN 

— oo 

NaN 

— OO 

— OO 

— OO 

— OO 

— oo 

NaN 

-F 

-0 

-F 

-F 

-F 

-F 

— oo 

NaN 

-0 

-0 

-0 

-0 

-0 

-0 

NaN 

NaN 

-1-0 

-hO 

-hO 

-hO 

-1-0 

-1-0 

NaN 

NaN 

+ F 

-hO 

+ F 

+ F 

+ F 

+ F 

+ OO 

NaN 

+ OO 

NaN 

+ OO 

+ OO 

+ OO 

OO 

+ oo 

NaN 

NaN 

NaN 

NaN 

NaN 

NaN 

NaN 

NaN 

NaN 


NOTES: 

F Means finite floating-point value. 


In most cases, only the exponent is changed and the mantissa (significand) remains unchanged. Flowever, when 
the value being scaled in ST(0) is a denormal value, the mantissa is also changed and the result may turn out to be 
a normalized number. Similarly, if overflow or underflow results from a scale operation, the resulting mantissa will 
differ from the source's mantissa. 

The FSCALE instruction can also be used to reverse the action of the FXTRACT instruction, as shown in the following 
example: 

FXTRACT; 

FSCALE; 

FSTP ST(1 ); 

In this example, the FXTRACT instruction extracts the significand and exponent from the value in ST(0) and stores 
them in ST(0) and ST(1) respectively. The FSCALE then scales the significand in ST(0) by the exponent in ST(1), 
recreating the original value before the FXTRACT operation was performed. The FSTP ST(1) instruction overwrites 
the exponent (extracted by the FXTRACT instruction) with the recreated value, which returns the stack to its orig¬ 
inal state with only one register [ST(0)] occupied. 

This instruction's operation is the same in non-64-bit modes and 64-bit mode. 

Operation 

Sj(0) <— ST(0) * 


FPU Flags Affected 

Cl 


CO, C2, C3 


FSCALE-Scale 


Set to 0 if stack underflow occurred. 

Set if result was rounded up; cleared otherwise. 
Undefined. 
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Floating-Point Exceptions 


#is 

#IA 

#D 

#U 

#0 

#P 


Stack underflow occurred. 

Source operand is an SNaN value or unsupported format. 
Source operand is a denormal value. 

Result is too small for destination format. 

Result is too large for destination format. 

Value cannot be represented exactly in destination format. 


Protected Mode Exceptions 


#NM 

#MF 

#UD 


CRO.EM[bit 2] or CRO.TS[bit 3] = 1. 

If there is a pending x87 FPU exception. 
If the LOCK prefix is used. 


Real-Address Mode Exceptions 

Same exceptions as in protected mode. 

Virtual-SOSe Mode Exceptions 

Same exceptions as in protected mode. 

Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

e4-Bit Mode Exceptions 

Same exceptions as in protected mode. 
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FSIN-Sine 


Opcode 

Instruction 

64-Bit 

Mode 

Compat/ 

Leg Mode 

Description 

D9 FE 

FSIN 

Valid 

Valid 

Replace ST(0) with the approximate of its sine. 


Description 

Computes an approximation of the sine of the source operand in register ST(0) and stores the result in ST(0). The 
source operand must be given in radians and must be within the range -2^^ to +2®^. The following table shows the 
results obtained when taking the sine of various classes of numbers, assuming that underflow does not occur. 


Table 3-35. FSIN Results 


SRC(ST(0)) 

BEST (ST(0)) 

— oo 

■k 

- F 

- Mo+ ^ 

-0 

-0 

-hO 

-hO 

-hF 

-1 to -n 

+ CO 

* 

NaN 

NaN 


NOTES: 


F Means finite floating-point value. 

* Indicates floating-point invalid-arithmetic-operand (#IA) exception. 

If the source operand is outside the acceptable range, the C2 flag in the FPU status word is set, and the value in 
register ST(0) remains unchanged. The instruction does not raise an exception when the source operand is out of 
range. It is up to the program to check the C2 flag for out-of-range conditions. Source values outside the range - 
2®^ to +2^^ can be reduced to the range of the instruction by subtracting an appropriate integer multiple of 27c. 
Flowever, even within the range -2®^ to +2^^, inaccurate results can occur because the finite approximation of ti 
used internally for argument reduction is not sufficient in all cases. Therefore, for accurate results it is safe to apply 
FSIN only to arguments reduced accurately in software, to a value smaller in absolute value than 37t/4. See the 
sections titled "Approximation of Pi" and "Transcendental Instruction Accuracy" in Chapter 8 of the Intel® 64 and 
IA-32 Architectures Software Developer's Manual, Volume 1, for a discussion of the proper value to use for n in 
performing such reductions. 

This instruction's operation is the same in non-64-bit modes and 64-bit mode. 

Operation 

IF -2^3 < 5j(0) < 2^3 
THEN 

C2^0; 

ST(0) <- fsin(ST(0)); // approximation of the mathematical sin function 
ELSE (* Source operand out of range *) 

C2^1; 

FI; 

FPU Flags Affected 

Cl Set to 0 if stack underflow occurred. 

Set if result was rounded up; cleared otherwise. 

C2 Set to 1 if outside range (-2^^ < source operand < +2^^)) otherwise, set to 0. 

CO, C3 Undefined. 
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Floating-Point Exceptions 

#IS stack underflow occurred. 

#IA Source operand is an SNaN value, or unsupported format. 

#D Source operand is a denormal value. 

#P Value cannot be represented exactly in destination format. 

Protected Mode Exceptions 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

#MF If there is a pending x87 FPU exception. 

#UD If the LOCK prefix is used. 

Real-Address Mode Exceptions 

Same exceptions as in protected mode. 

Virtual-SOSe Mode Exceptions 

Same exceptions as in protected mode. 

Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

e4-Bit Mode Exceptions 

Same exceptions as in protected mode. 
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FSINCOS—Sine and Cosine 


Opcode 

Instruction 

64-Bit 

Mode 

Compat/ 

Leg Mode 

Description 

D9 FB 

FSINCOS 

Valid 

Valid 

Compute the sine and cosine of ST(0); replace ST(0) with the 
approximate sine, and push the approximate cosine onto the 
register stack. 


Description 

Computes both the approximate sine and the cosine of the source operand in register ST(0), stores the sine in 
ST(0), and pushes the cosine onto the top of the FPU register stack. (This instruction is faster than executing the 
FSIN and FCOS instructions in succession.) 

The source operand must be given in radians and must be within the range -2®^ to +2®^. The following table shows 
the results obtained when taking the sine and cosine of various classes of numbers, assuming that underflow does 
not occur. 


Table 3-36. FSINCOS Results 


SRC 

DEST 

ST(0) 

ST(1) Cosine 

ST(0) Sine 

— oo 

★ 

* 

-F 

- 1 to H- 1 

- 1 to -Hi 

-0 

+ 1 

-0 

-hO 

-1-1 

H-O 

+ F 

- 1 to - 1 -1 

- 1 to - 1 -1 

+ OO 

* 

■k 

NaN 

NaN 

NaN 


NOTES: 


F Means finite floating-point value. 

* Indicates floating-point invalld-arlthmetic-operand (#IA) exception. 

If the source operand is outside the acceptable range, the C2 flag in the FPU status word is set, and the value in 
register ST(0) remains unchanged. The instruction does not raise an exception when the source operand is out of 
range. It is up to the program to check the C2 flag for out-of-range conditions. Source values outside the range - 
2®^ to +2^^ can be reduced to the range of the instruction by subtracting an appropriate integer multiple of 2n. 
Flowever, even within the range -2®^ to +2^^, inaccurate results can occur because the finite approximation of ti 
used internally for argument reduction is not sufficient in all cases. Therefore, for accurate results it is safe to apply 
FSINCOS only to arguments reduced accurately in software, to a value smaller in absolute value than 37t/8. See the 
sections titled "Approximation of Pi" and "Transcendental Instruction Accuracy" in Chapter 8 of the Intel® 64 and 
IA-32 Architectures Software Developer's Manual, Volume 1, for a discussion of the proper value to use for n in 
performing such reductions. 

This instruction's operation is the same in non-64-bit modes and 64-bit mode. 
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Operation 

IF ST(0) < 2^3 
THEN 

C2^0; 

TEMP fcos(ST(0)); // approximation of cosine 
ST(0) <- fsin(ST(0)); // approximation of sine 
TOP ^ TOP-1; 

ST(0) ^ TEMP; 

ELSE (* Source operand out of range *) 

C2^ 1; 

FI; 

FPU Flags Affected 

Cl Set to 0 if stack underflow occurred; set to 1 of stack overflow occurs. 

Set if result was rounded up; cleared otherwise. 

C2 Set to 1 if outside range (-2®^ < source operand < +2®^); otherwise, set to 0. 

CO, C3 Undefined. 

Floating-Point Exceptions 

#IS stack underflow or overflow occurred. 

#IA Source operand is an SNaN value, or unsupported format. 

#D Source operand is a denormal value. 

#U Result is too small for destination format. 

#P Value cannot be represented exactly in destination format. 

Protected Mode Exceptions 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

#MF If there is a pending x87 FPU exception. 

#UD If the LOCK prefix is used. 

Real-Address Mode Exceptions 

Same exceptions as in protected mode. 

Virtual-SOSe Mode Exceptions 

Same exceptions as in protected mode. 

Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

e4-Bit Mode Exceptions 

Same exceptions as in protected mode. 
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FSQRT—Square Root 


Opcode 

Instruction 

64-Bit 

Mode 

Compat/ 

Leg Mode 

Description 

D9 FA 

FSQRT 

Valid 

Valid 

Computes square root of ST(0) and stores the result in ST(0). 


Description 

Computes the square root of the source value in the ST(0) register and stores the result in ST(0). 

The following table shows the results obtained when taking the square root of various classes of numbers, 
assuming that neither overflow nor underflow occurs. 


Table 3-37. FSQRT Results 


SRC(ST(0)) 

BEST (ST(0)) 

— oo 

★ 

- F 

* 

-0 

-0 

-hO 

-hO 

+ F 

+ F 

+ OO 

+ OO 

NaN 

NaN 


NOTES: 


F Means finite floating-point value. 

* Indicates floating-point invalid-arithmetic-operand (#IA) exception. 

This instruction's operation is the same in non-64-bit modes and 64-bit mode. 

Operation 

ST(0) ^ SquareRoot(ST(0)); 

FPU Flags Affected 

Cl Set to 0 if stack underflow occurred. 

Set if result was rounded up; cleared otherwise. 

CO, C2, C3 Undefined. 

Floating-Point Exceptions 

#IS stack underflow occurred. 

#IA Source operand is an SNaN value or unsupported format. 

Source operand is a negative value (except for -0). 

#D Source operand is a denormal value. 

#P Value cannot be represented exactly in destination format. 

Protected Mode Exceptions 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

#MF If there is a pending x87 FPU exception. 

#UD If the LOCK prefix is used. 

Real-Address Mode Exceptions 

Same exceptions as in protected mode. 
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Virtual-SOSe Mode Exceptions 

Same exceptions as in protected mode. 

Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

e4-Bit Mode Exceptions 

Same exceptions as in protected mode. 
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FST/FSTP—Store Floating Point Value 


Opcode 

Instruction 

64-Bit 

Mode 

Compat/ 

Leg Mode 

Description 

D9 /2 

FST m32fp 

Valid 

Valid 

Copy ST(0) to m32fp. 

DD/2 

FST m64fp 

Valid 

Valid 

Copy ST(0) to m64fp. 

DD DO+i 

FST ST(i) 

Valid 

Valid 

Copy ST(0) to ST(i). 

D9 /3 

FSTP m32fp 

Valid 

Valid 

Copy ST(0) to m32fp and pop register stack. 

DD/3 

FSTP m64fp 

Valid 

Valid 

Copy ST(0) to m64fp and pop register stack. 

DB/7 

FSTP mSOfp 

Valid 

Valid 

Copy ST(0) to mSOfp and pop register stack. 

DD D8+i 

FSTP ST(i) 

Valid 

Valid 

Copy ST(0) to ST(i) and pop register stack. 


Description 

The FST instruction copies the value in the ST(0) register to the destination operand, which can be a memory loca¬ 
tion or another register in the FPU register stack. When storing the value in memory, the value is converted to 
single-precision or double-precision floating-point format. 

The FSTP instruction performs the same operation as the FST instruction and then pops the register stack. To pop 
the register stack, the processor marks the ST(0) register as empty and increments the stack pointer (TOP) by 1. 
The FSTP instruction can also store values in memory in double extended-precision floating-point format. 

If the destination operand is a memory location, the operand specifies the address where the first byte of the desti¬ 
nation value is to be stored. If the destination operand is a register, the operand specifies a register in the register 
stack relative to the top of the stack. 

If the destination size is single-precision or double-precision, the significand of the value being stored is rounded 
to the width of the destination (according to the rounding mode specified by the RC field of the FPU control word), 
and the exponent is converted to the width and bias of the destination format. If the value being stored is too large 
for the destination format, a numeric overflow exception (#0) is generated and, if the exception is unmasked, no 
value is stored in the destination operand. If the value being stored is a denormal value, the denormal exception 
(#D) is not generated. This condition is simply signaled as a numeric underflow exception (#U) condition. 

If the value being stored is ±0, ±°°, or a NaN, the least-significant bits of the significand and the exponent are trun¬ 
cated to fit the destination format. This operation preserves the value's identity as a 0, »=, or NaN. 

If the destination operand is a non-empty register, the invalid-operation exception is not generated. 

This instruction's operation is the same in non-64-bit modes and 64-bit mode. 

Operation 

DEST ^ ST(0); 

IF Instruction = FSTP 
THEN 

PopRegisterStack; 

FI; 

FPU Flags Affected 

Cl Set to 0 if stack underflow occurred. 

Indicates rounding direction of if the floating-point inexact exception (#P) is generated: 0 <- 
not roundup; 1 <- roundup. 

CO, C2, C3 Undefined. 
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Floating-Point Exceptions 

#IS stack underflow occurred. 

#IA If destination result is an SNaN value or unsupported format, except when the destination 

format is in double extended-precision floating-point format. 

#U Result is too small for the destination format. 

#0 Result is too large for the destination format. 

#P Value cannot be represented exactly in destination format. 


Protected Mode Exceptions 

#GP(0) If the destination is located in a non-writable segment. 

If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

If the DS, ES, FS, or GS register is used to access memory and it contains a NULL segment 
selector. 

#SS(0) If a memory operand effective address is outside the SS segment limit. 

#NM CRO.EM[bit 2] or CRO.TS[bit 3] = 1. 


#PF(fault-code) If a page fault occurs. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the 

current privilege level is 3. 

#UD If the LOCK prefix is used. 


Real-Address Mode 

#GP 

#SS 

#NM 

#UD 


Exceptions 

If a memory operand effective address 
If a memory operand effective address 
CRO.EM[bit 2] or CRO.TS[bit 3] = 1. 

If the LOCK prefix is used. 


is outside the CS, DS, ES, FS, or GS segment limit, 
is outside the SS segment limit. 


Virtual-SOSe Mode Exceptions 

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

#SS(0) If a memory operand effective address is outside the SS segment limit. 

#NM CRO.EM[bit 2] or CRO.TS[bit 3] = 1. 


#PF(fault-code) 

#AC(0) 

#UD 


If a page fault occurs. 

If alignment checking is enabled and an unaligned memory reference is made. 
If the LOCK prefix is used. 


Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

e4-Bit Mode Exceptions 

#SS(0) If a memory address referencing the SS segment is in a non-canonical form. 

#GP(0) If the memory address is in a non-canonical form. 

#NM CRO.EM[bit 2] or CRO.TS[bit 3] = 1. 

#MF If there is a pending x87 FPU exception. 

#PF(fault-code) If a page fault occurs. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the 

current privilege level is 3. 

#UD If the LOCK prefix is used. 
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FSTCW/FNSTCW-Store x87 FPU Control Word 


Opcode 

Instruction 

64-Bit 

Mode 

Compat/ 
Leg Mode 

Description 

9B D9 n 

FSTCW mZbyte 

Valid 

Valid 

Store FPU control word to mZbyte after checking for 
pending unmasked floating-point exceptions. 

D9 /7 

FNSTCW mZbyte 

Valid 

Valid 

Store FPU control word to mZbyte without checking for 
pending unmasked floating-point exceptions. 


NOTES: 

* See IA-32 Architecture Compatibility section below. 


Description 

Stores the current value of the FPU control word at the specified destination in memory. The FSTCW instruction 
checks for and handles pending unmasked floating-point exceptions before storing the control word; the FNSTCW 
instruction does not. 

The assembler issues two instructions for the FSTCW instruction (an FWAIT instruction followed by an FNSTCW 
instruction), and the processor executes each of these instructions in separately. If an exception is generated for 
either of these instructions, the save EIP points to the instruction that caused the exception. 

This instruction's operation is the same in non-64-bit modes and 64-bit mode. 

IA-32 Architecture Compatibility 

When operating a Pentium or Intel486 processor in MS-DOS compatibility mode, it is possible (under unusual 
circumstances) for an FNSTCW instruction to be interrupted prior to being executed to handle a pending FPU 
exception. See the section titled "No-Wait FPU Instructions Can Get FPU Interrupt in Window" in Appendix D of the 
I ntel® 64 and IA-32 Architectures Software Developer's Manual, Volume 1, for a description of these circum¬ 
stances. An FNSTCW instruction cannot be interrupted in this way on later Intel processors, except for the Intel 
Quark™ XIOOO processor. 

Operation 

DEST ^ FPUControlWord; 

FPU Flags Affected 

The CO, Cl, C2, and C3 flags are undefined. 

Floating-Point Exceptions 

None. 


Protected Mode Exceptions 

#GP(0) If the destination is located in a non-writable segment. 

If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 
If the DS, ES, FS, or GS register is used to access memory and it contains a NULL segment 
selector. 


#SS(0) 

#NM 

#PF(fault-code) 

#AC(0) 

#UD 


If a memory operand effective address is outside the SS segment limit. 

CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

If a page fault occurs. 

If alignment checking is enabled and an unaligned memory reference is made while the 
current privilege level is 3. 

If the LOCK prefix is used. 
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Real-Address Mode Exceptions 

#GP If a memory operand effective address 

#SS If a memory operand effective address 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

#UD If the LOCK prefix is used. 


is outside the CS, DS, ES, FS, or GS segment limit, 
is outside the SS segment limit. 


Virtual-SOSe Mode Exceptions 

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

#SS(0) If a memory operand effective address is outside the SS segment limit. 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 


#PF(fault-code) 

#AC(0) 

#UD 


If a page fault occurs. 

If alignment checking is enabled and an unaligned memory reference is made. 
If the LOCK prefix is used. 


Compatibility Mode Exceptions 

Same exceptions as in protected mode. 


64-Bit Mode Exceptions 

#SS(0) If a memory address referencing the SS segment is in a non-canonical form. 

#GP(0) If the memory address is in a non-canonical form. 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

#MF If there is a pending x87 FPU exception. 

#PF(fault-code) If a page fault occurs. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the 

current privilege level is 3. 

#UD If the LOCK prefix is used. 
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FSTENV/FNSTENV-Store x87 FPU Environment 


Opcode 

Instruction 

64-Bit 

Mode 

Compat/ 

Leg Mode 

Description 

9B D9 /6 

FSTENV m14/28bvte 

Valid 

Valid 

Store FPU environment to m 7 4byte or m28b\/te 
after checking for pending unmasked floating-point 
exceptions. Then mask all floating-point exceptions. 

D9 /6 

FNSTENV ml4/28byte 

Valid 

Valid 

Store FPU environment to m14b\/te or m28byte 
without checking for pending unmasked floating¬ 
point exceptions. Then mask all floating¬ 
point exceptions. 


NOTES: 

* See IA-32 Architecture Compatibility section below. 


Description 

Saves the current FPU operating environment at the memory location specified with the destination operand, and 
then masks all floating-point exceptions. The FPU operating environment consists of the FPU control word, status 
word, tag word, instruction pointer, data pointer, and last opcode. Figures 8-9 through 8-12 in the Intel® 64 and 
IA-32 Architectures Software Developer's Manual, Volume 1, show the layout in memory of the stored environ¬ 
ment, depending on the operating mode of the processor (protected or real) and the current operand-size attribute 
(16-bit or 32-bit). In virtual-8086 mode, the real mode layouts are used. 

The FSTENV instruction checks for and handles any pending unmasked floating-point exceptions before storing 
the FPU environment; the FNSTENV instruction does not. The saved image reflects the state of the FPU after all 
floating-point instructions preceding the FSTENV/FNSTENV instruction in the instruction stream have been 
executed. 

These instructions are often used by exception handlers because they provide access to the FPU instruction and 
data pointers. The environment is typically saved in the stack. Masking all exceptions after saving the environment 
prevents floating-point exceptions from interrupting the exception handler. 

The assembler issues two instructions for the FSTENV instruction (an FWAIT instruction followed by an FNSTENV 
instruction), and the processor executes each of these instructions separately. If an exception is generated for 
either of these instructions, the save EIP points to the instruction that caused the exception. 

This instruction's operation is the same in non-64-bit modes and 64-bit mode. 

IA-32 Architecture Compatibility 

When operating a Pentium or Intel486 processor in MS-DOS compatibility mode, it is possible (under unusual 
circumstances) for an FNSTENV instruction to be interrupted prior to being executed to handle a pending FPU 
exception. See the section titled "No-Wait FPU Instructions Can Get FPU Interrupt in Window" in Appendix D of the 
Intel® 64 and IA-32 Architectures Software Developer's Manual, Volume 1, for a description of these circum¬ 
stances. An FNSTENV instruction cannot be interrupted in this way on later Intel processors, except for the Intel 
Quark™ XIOOO processor. 

Operation 

DEST[FPUControlWord] ^ FPUControlWord; 

DEST[FPUStatusWord] ^ FPUStatusWord; 

DEST[FPUTagWord] ^ FPUTagWord; 

DEST[FPUDataPointer] FPUDataPointer; 

DEST[FPUInstructionPointer] <- FPUInstructionPointer; 

DEST[FPULastlnstructionOpcode] <- FPULastlnstructionOpcode; 

FPU Flags Affected 

The CO, Cl, C2, and C3 are undefined. 
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Floating-Point Exceptions 

None 


Protected Mode Exceptions 

#GP(0) If the destination is located in a non-writable segment. 

If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

If the DS, ES, FS, or GS register is used to access memory and it contains a NULL segment 
selector. 

#SS(0) If a memory operand effective address is outside the SS segment limit. 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 


#PF(fault-code) If a page fault occurs. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the 

current privilege level is 3. 

#UD If the LOCK prefix is used. 


Real-Address Mode 

#GP 

#SS 

#NM 

#UD 


Exceptions 

If a memory operand effective address 
If a memory operand effective address 
CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

If the LOCK prefix is used. 


is outside the CS, DS, ES, FS, or GS segment limit, 
is outside the SS segment limit. 


Virtual-SOSe Mode Exceptions 

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

#SS(0) If a memory operand effective address is outside the SS segment limit. 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 


#PF(fault-code) 

#AC(0) 

#UD 


If a page fault occurs. 

If alignment checking is enabled and an unaligned memory reference is made. 
If the LOCK prefix is used. 


Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

64-Bit Mode Exceptions 

#SS(0) If a memory address referencing the SS segment is in a non-canonical form. 

#GP(0) If the memory address is in a non-canonical form. 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

#MF If there is a pending x87 FPU exception. 

#PF(fault-code) If a page fault occurs. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the 

current privilege level is 3. 

#UD If the LOCK prefix is used. 
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FSTSW/FNSTSW-Store x87 FPU Status Word 


Opcode 

Instruction 

64-Bit 

Mode 

Compat/ 

Leg Mode 

Description 

98 DD n 

FSTSW mZbyfe 

Valid 

Valid 

Store FPU status word at mZbyte after checking 
for pending unmasked floating-point exceptions. 

98 DF EO 

FSTSW AX 

Valid 

Valid 

Store FPU status word in AX register after 
checking for pending unmasked floating-point 
exceptions. 

DD n 

FNSTSW m2b\/te 

Valid 

Valid 

Store FPU status word at mZbyte without 
checking for pending unmasked floating-point 
exceptions. 

DF EO 

FNSTSW' AX 

Valid 

Valid 

Store FPU status word in AX register without 
checking for pending unmasked floating-point 
exceptions. 


NOTES: 

* See IA-32 Architecture Compatibility section below. 


Description 

Stores the current value of the x87 FPU status word in the destination location. The destination operand can be 
either a two-byte memory location or the AX register. The FSTSW instruction checks for and handles pending 
unmasked floating-point exceptions before storing the status word; the FNSTSW instruction does not. 

The FNSTSW AX form of the instruction is used primarily in conditional branching (for instance, after an FPU 
comparison instruction or an FPREM, FPREMl, or FXAM instruction), where the direction of the branch depends on 
the state of the FPU condition code flags. (See the section titled "Branching and Conditional Moves on FPU Condi¬ 
tion Codes" in Chapter 8 of the Intel® 64 and IA-32 Architectures Software Developer's Manual, Volume 1.) This 
instruction can also be used to invoke exception handlers (by examining the exception flags) in environments that 
do not use interrupts. When the FNSTSW AX instruction is executed, the AX register is updated before the 
processor executes any further instructions. The status stored in the AX register is thus guaranteed to be from the 
completion of the prior FPU instruction. 

The assembler issues two instructions for the FSTSW instruction (an FWAIT instruction followed by an FNSTSW 
instruction), and the processor executes each of these instructions separately. If an exception is generated for 
either of these instructions, the save EIP points to the instruction that caused the exception. 

This instruction's operation is the same in non-64-bit modes and 64-bit mode. 

IA-32 Architecture Compatibility 

When operating a Pentium or Intel486 processor in MS-DOS compatibility mode, it is possible (under unusual 
circumstances) for an FNSTSW instruction to be interrupted prior to being executed to handle a pending FPU 
exception. See the section titled "No-Wait FPU Instructions Can Get FPU Interrupt in Window" in Appendix D of the 
I ntel® 64 and IA-32 Architectures Software Developer's Manual, Volume 1, for a description of these circum¬ 
stances. An FNSTSW instruction cannot be interrupted in this way on later Intel processors, except for the Intel 
Quark™ XIOOO processor. 

Operation 

BEST ^ FPUStatusWord; 

FPU Flags Affected 

The CO, Cl, C2, and C3 are undefined. 

Floating-Point Exceptions 

None 
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Protected Mode Exceptions 

#GP(0) If the destination is located in a non-writable segment. 

If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 
If the DS, ES, FS, or GS register is used to access memory and it contains a NULL segment 
selector. 

#SS(0) If a memory operand effective address is outside the SS segment limit. 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 


#PF(fault-code) If a page fault occurs. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the 

current privilege level is 3. 

#UD If the LOCK prefix is used. 


Real-Address Mode Exceptions 

#GP If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

#SS If a memory operand effective address is outside the SS segment limit. 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

Virtual-SOSe Mode Exceptions 

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

#SS(0) If a memory operand effective address is outside the SS segment limit. 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 


#PF(fault-code) 

#AC(0) 

#UD 


If a page fault occurs. 

If alignment checking is enabled and an unaligned memory reference is made. 
If the LOCK prefix is used. 


Compatibility Mode Exceptions 

Same exceptions as in protected mode. 


64-Bit Mode Exceptions 

#SS(0) If a memory address referencing the SS segment is in a non-canonical form. 

#GP(0) If the memory address is in a non-canonical form. 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

#MF If there is a pending x87 FPU exception. 

#PF(fault-code) If a page fault occurs. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the 

current privilege level is 3. 

#UD If the LOCK prefix is used. 
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FSUB/FSUBP/FISUB-Subtract 


Opcode 

Instruction 

64-Bit 

Mode 

Compat/ 

Leg Mode 

Description 

D8 /4 

FSUB m32fp 

Valid 

Valid 

Subtract m32fp from ST(0) and store result in ST(0). 

DC /4 

FSUB m64fp 

Valid 

Valid 

Subtract m64/p from ST(0) and store result in ST(0). 

D8 EO+i 

FSUB ST(0), ST(i) 

Valid 

Valid 

Subtract ST(i) from ST(0) and store result in ST(0). 

DC E8+i 

FSUB ST(i), ST(0) 

Valid 

Valid 

Subtract ST(0) from ST(i) and store result in ST(i). 

DE E8+i 

FSUBP ST(i), ST(0) 

Valid 

Valid 

Subtract ST(0) from ST(i), store result in ST(i), and 
pop register stack. 

DE E9 

FSUBP 

Valid 

Valid 

Subtract ST(0) from ST(1), store result in ST(1), and 
pop register stack. 

DA /4 

FISUB m32int 

Valid 

Valid 

Subtract m32intfrom ST(0) and store result in ST(0). 

DE/4 

F\SUB ml Bint 

Valid 

Valid 

Subtract ml6/nf from ST(0) and store result in ST(0). 


Description 

Subtracts the source operand from the destination operand and stores the difference in the destination location. 
The destination operand is always an FPU data register; the source operand can be a register or a memory location. 
Source operands in memory can be in single-precision or double-precision floating-point format or in word or 
doubleword integer format. 

The no-operand version of the instruction subtracts the contents of the ST(0) register from the ST(1) register and 
stores the result in ST(1). The one-operand version subtracts the contents of a memory location (either a floating¬ 
point or an integer value) from the contents of the ST(0) register and stores the result in ST(0). The two-operand 
version, subtracts the contents of the ST(0) register from the ST(i) register or vice versa. 

The FSUBP instructions perform the additional operation of popping the FPU register stack following the subtrac¬ 
tion. To pop the register stack, the processor marks the ST(0) register as empty and increments the stack pointer 
(TOP) by 1. The no-operand version of the floating-point subtract instructions always results in the register stack 
being popped. In some assemblers, the mnemonic for this instruction is FSUB rather than FSUBP. 

The FISUB instructions convert an integer source operand to double extended-precision floating-point format 
before performing the subtraction. 

Table 3-38 shows the results obtained when subtracting various classes of numbers from one another, assuming 
that neither overflow nor underflow occurs. Flere, the SRC value is subtracted from the DEST value (DEST - SRC = 
result). 

When the difference between two operands of like sign is 0, the result is +0, except for the round toward mode, 
in which case the result is -0. This instruction also guarantees that +0 - (-0) =+0, and that -0 - (+0) =-0. When the 
source operand is an integer 0, it is treated as a +0. 

When one operand is °o, the result is °° of the expected sign. If both operands are °° of the same sign, an invalid- 
operation exception is generated. 
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Table 3-38. FSUB/FSUBP/FISUB Results 


SRC 



— exp 

- F or - 1 

-0 

-rO 

-H F or -H1 

-|- exp 

NaN 

— oo 

* 

— exp 

— exp 

— exp 

— exp 

— exp 

NaN 

-F 

+ exp 

+F or +0 

DEST 

DEST 

-F 

— exp 

NaN 

-0 

+ exp 

-SRC 

+0 

-0 

-SRC 

— exp 

NaN 

-hO 

+ exp 

-SRC 

-hO 

+0 

-SRC 

— exp 

NaN 

+ F 

+ exp 

-i-F 

DEST 

DEST 

+F or +0 

— exp 

NaN 

+ OO 

-|- exp 

+ exp 

+ exp 

+ exp 

+ exp 

* 

NaN 

NaN 

NaN 

NaN 

NaN 

NaN 

NaN 

NaN 

NaN 


NOTES: 

F Means finite floating-point value. 

I Means Integer. 

* Indicates floating-point invalid-arithmetic-operand (#IA) exception. 

This instruction's operation is the same in non-64-bit modes and 64-bit mode. 

Operation 

IF Instruction = FISUB 
THEN 

BEST <- DEST-ConvertToDoubleExtendedPrecisionFP(SRC); 

ELSE (* Source operand is floating-point value *) 

DEST ^ DEST - SRC; 

FI; 

IF Instruction = FSUBP 
THEN 

PopRegIsterStack; 

FI; 

FPU Flags Affected 

Cl Set to 0 if stack underflow occurred. 

Set if result was rounded up; cleared otherwise. 

CO, C2, C3 Undefined. 

Floating-Point Exceptions 

#IS stack underflow occurred. 

#IA Operand is an SNaN value or unsupported format. 

Operands are infinities of like sign. 

#D Source operand is a denormal value. 

#U Result is too small for destination format. 

#0 Result is too large for destination format. 

#P Value cannot be represented exactly in destination format. 
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Protected Mode Exceptions 

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

If the DS, ES, FS, or GS register is used to access memory and it contains a NULL segment 
selector. 

#SS(0) If a memory operand effective address is outside the SS segment limit. 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 


#PF(fault-code) If a page fault occurs. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the 

current privilege level is 3. 

#UD If the LOCK prefix is used. 


Real-Address Mode 

#GP 

#SS 

#NM 

#UD 


Exceptions 

If a memory operand effective address 
If a memory operand effective address 
CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

If the LOCK prefix is used. 


is outside the CS, DS, ES, FS, or GS segment limit, 
is outside the SS segment limit. 


Virtual-SOSe Mode 

#GP(0) 

#SS(0) 

#NM 

#PF(fault-code) 

#AC(0) 

#UD 


Exceptions 

If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 
If a memory operand effective address is outside the SS segment limit. 

CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

If a page fault occurs. 

If alignment checking is enabled and an unaligned memory reference is made. 

If the LOCK prefix is used. 


Compatibility Mode Exceptions 

Same exceptions as in protected mode. 


64-Bit Mode Exceptions 

#SS(0) If a memory address referencing the SS segment is in a non-canonical form. 

#GP(0) If the memory address is in a non-canonical form. 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

#MF If there is a pending x87 FPU exception. 

#PF(fault-code) If a page fault occurs. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the 

current privilege level is 3. 

#UD If the LOCK prefix is used. 
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FSUBR/FSUBRP/FISUBR-Reverse Subtract 


Opcode 

Instruction 

64-Bit 

Mode 

Compat/ 

Leg Mode 

Description 

D8/5 

FSUBR m32fp 

Valid 

Valid 

Subtract ST(0) from m32fp and store result in ST(0). 

DC/5 

FSUBR m64fp 

Valid 

Valid 

Subtract ST(0) from m64fp and store result in ST(0). 

D8 E8+I 

FSUBR ST(0), ST(i) 

Valid 

Valid 

Subtract ST(0) from ST(i) and store result in ST(0). 

DC EO+I 

FSUBR ST(I), ST(0) 

Valid 

Valid 

Subtract ST(i) from ST(0) and store result in ST(i). 

DE EO+I 

FSUBRP ST(i), ST(0) 

Valid 

Valid 

Subtract ST(i) from ST(0), store result in ST(i), and 
pop register stack. 

DEE1 

FSUBRP 

Valid 

Valid 

Subtract ST(1) from ST(0), store result in ST(1), and 
pop register stack. 

DA /5 

FISUBR m32int 

Valid 

Valid 

Subtract ST(0) from m32intan6 store result in ST(0). 

DE/5 

FISUBR m 7 6/nf 

Valid 

Valid 

Subtract ST(0)from m76/nfand store result in ST(0). 


Description 

Subtracts the destination operand from the source operand and stores the difference in the destination location. 
The destination operand is always an FPU register; the source operand can be a register or a memory location. 
Source operands in memory can be in single-precision or double-precision floating-point format or in word or 
doubleword integer format. 

These instructions perform the reverse operations of the FSUB, FSUBP, and FISUB instructions. They are provided 
to support more efficient coding. 

The no-operand version of the instruction subtracts the contents of the ST(1) register from the ST(0) register and 
stores the result in ST(1). The one-operand version subtracts the contents of the ST(0) register from the contents 
of a memory location (either a floating-point or an integer value) and stores the result in ST(0). The two-operand 
version, subtracts the contents of the ST(i) register from the ST(0) register or vice versa. 

The FSUBRP instructions perform the additional operation of popping the FPU register stack following the subtrac¬ 
tion. To pop the register stack, the processor marks the ST(0) register as empty and increments the stack pointer 
(TOP) by 1. The no-operand version of the floating-point reverse subtract instructions always results in the register 
stack being popped. In some assemblers, the mnemonic for this instruction is FSUBR rather than FSUBRP. 

The FISUBR instructions convert an integer source operand to double extended-precision floating-point format 
before performing the subtraction. 

The following table shows the results obtained when subtracting various classes of numbers from one another, 
assuming that neither overflow nor underflow occurs. Flere, the DEST value is subtracted from the SRC value (SRC 
- DEST = result). 

When the difference between two operands of like sign is 0, the result is +0, except for the round toward mode, 
in which case the result is -0. This instruction also guarantees that +0 - (-0) = +0, and that -0 - (+0) = -0. When the 
source operand is an integer 0, it is treated as a +0. 

When one operand is the result is <>= of the expected sign. If both operands are of the same sign, an invalid- 
operation exception is generated. 
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Table 3-39. FSUBR/FSUBRP/FISUBR Results 


SRC 



— OO 

-F or -1 

-0 

-tO 

-hF or -tl 

+ OO 

NaN 

— oo 

★ 

+ OO 

+ OO 

+ OO 

+ OO 

+ oo 

NaN 

-F 

— oo 

+F or +0 

-DEST 

-DEST 

+ F 

+ oo 

NaN 

-0 

— oo 

SRC 

+0 

-tO 

SRC 

+ oo 

NaN 

-tO 

— oo 

SRC 

-0 

+0 

SRC 

+ oo 

NaN 

+ F 

— oo 

-F 

-DEST 

-DEST 

+F or +0 

+ oo 

NaN 

+ OO 

— oo 

— OO 

— OO 

— OO 

— OO 

★ 

NaN 

NaN 

NaN 

NaN 

NaN 

NaN 

NaN 

NaN 

NaN 


NOTES: 

F Means finite floating-point value. 

I Means Integer. 

* Indicates floating-point invalid-arithmetic-operand (#IA) exception. 

This instruction's operation is the same in non-64-bit modes and 64-bit mode. 

Operation 

IF Instruction = FISUBR 
THEN 

DEST ConvertToDoubleExtendedPrecisionFP(SRC) - DEST; 

ELSE (* Source operand is floating-point value *) 

DEST ^ SRC-DEST; FI; 

IF Instruction = FSUBRP 
THEN 

PopRegisterStack; FI; 

FPU Flags Affected 

Cl Set to 0 if stack underflow occurred. 

Set if result was rounded up; cleared otherwise. 

CO, C2, C3 Undefined. 


Floating-Point Exceptions 


#is 

#IA 

#D 

#U 

#0 

#P 


Stack underflow occurred. 

Operand is an SNaN value or unsupported format. 
Operands are infinities of like sign. 

Source operand is a denormal value. 

Result is too small for destination format. 

Result is too large for destination format. 

Value cannot be represented exactly in destination format. 
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Protected Mode Exceptions 

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

If the DS, ES, FS, or GS register is used to access memory and it contains a NULL segment 
selector. 

#SS(0) If a memory operand effective address is outside the SS segment limit. 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 


#PF(fault-code) If a page fault occurs. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the 

current privilege level is 3. 

#UD If the LOCK prefix is used. 


Real-Address Mode 

#GP 

#SS 

#NM 

#UD 


Exceptions 

If a memory operand effective address 
If a memory operand effective address 
CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

If the LOCK prefix is used. 


is outside the CS, DS, ES, FS, or GS segment limit, 
is outside the SS segment limit. 


Virtual-SOSe Mode Exceptions 

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

#SS(0) If a memory operand effective address is outside the SS segment limit. 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 


#PF(fault-code) 

#AC(0) 

#UD 


If a page fault occurs. 

If alignment checking is enabled and an unaligned memory reference is made. 
If the LOCK prefix is used. 


Compatibility Mode Exceptions 

Same exceptions as in protected mode. 


64-Bit Mode Exceptions 

#SS(0) If a memory address referencing the SS segment is in a non-canonical form. 

#GP(0) If the memory address is in a non-canonical form. 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

#MF If there is a pending x87 FPU exception. 

#PF(fault-code) If a page fault occurs. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the 

current privilege level is 3. 

#UD If the LOCK prefix is used. 
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FTST-TEST 


Opcode 

Instruction 

64-Bit 

Mode 

Compat/ 

Leg Mode 

Description 

D9 E4 

FTST 

Valid 

Valid 

Compare ST(0) with 0.0. 


Description 

Compares the value in the ST(0) register with 0.0 and sets the condition code flags CO, C2, and C3 in the FPU 
status word according to the results (see table below). 


Table 3-40. FTST Results 


Condition 

C3 

C2 

CO 

ST(0)>0.0 

0 

0 

0 

ST(0) <0.0 

0 

0 

1 

ST(0) = 0.0 

1 

0 

0 

Unordered 

1 

1 

1 


This instruction performs an "unordered comparison." An unordered comparison also checks the class of the 
numbers being compared (see "FXAM—Examine Floating-Point" in this chapter). If the value in register ST(0) is a 
NaN or is in an undefined format, the condition flags are set to "unordered" and the invalid operation exception is 
generated. 

The sign of zero is ignored, so that (- 0.0 +0.0). 

This instruction's operation is the same in non-64-bit modes and 64-bit mode. 

Operation 

CASE (relation of operands) OF 


Not comparable: C3, C2, CO + 

-111 

ST(0) > 0.0: 

C3, C2, CO + 

-000 

ST(0) < 0.0: 

C3, C2, CO + 

-001 

ST(0) = 0.0: 

C3, C2, CO + 

-100 


ESAC; 

FPU Flags Affected 

Cl Set to 0. 

CO, C2, C3 See Table 3-40. 

Floating-Point Exceptions 

#IS stack underflow occurred. 

#IA The source operand is a NaN value or is in an unsupported format. 

#D The source operand is a denormal value. 

Protected Mode Exceptions 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

#MF If there is a pending x87 FPU exception. 

#UD If the LOCK prefix is used. 

Real-Address Mode Exceptions 

Same exceptions as in protected mode. 
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Virtual-SOSe Mode Exceptions 

Same exceptions as in protected mode. 

Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

e4-Bit Mode Exceptions 

Same exceptions as in protected mode. 


3-402 Vol. 2A 


FTST-TEST 


INSTRUCTION SET REFERENCE, A-L 


FUCOM/FUCOMP/FUCOMPP—Unordered Compare Floating Point Values 


Opcode 

Instruction 

64-Bit 

Mode 

Compat/ 

Leg Mode 

Description 

DD EO+i 

FUCOM ST(i) 

Valid 

Valid 

Compare ST(0) with ST(i). 

DDE1 

FUCOM 

Valid 

Valid 

Compare ST(0) with ST(1). 

DD E8+i 

FUCOMP ST(i) 

Valid 

Valid 

Compare ST(0) with ST(i) and pop register stack. 

DD E9 

FUCOMP 

Valid 

Valid 

Compare ST(0) with ST(1) and pop register stack. 

DA E9 

FUCOMPP 

Valid 

Valid 

Compare ST(0) with ST(1) and pop register stack twice. 


Description 

Performs an unordered comparison of the contents of register ST(0) and ST(i) and sets condition code flags CO, C2, 
and C3 in the FPU status word according to the results (see the table below). If no operand is specified, the 
contents of registers ST(0) and ST(1) are compared. The sign of zero is ignored, so that -0.0 is equal to +0.0. 


Table 3-41. FUCOM/FUCOMP/FUCOMPP Results 


Comparison Results* 

C3 

C2 

CO 

STO > ST(i) 

0 

0 

0 

STO < ST(i) 

0 

0 

1 

STO = ST(i) 

1 

0 

0 

Unordered 

1 

1 

1 


NOTES: 


* Flags not set if unmasked invalid-arithmetic-operand (#IA) exception is generated. 

An unordered comparison checks the class of the numbers being compared (see "FXAM—Examine Floating-Point" 
in this chapter). The FUCOM/FUCOMP/FUCOMPP instructions perform the same operations as the 
FCOM/FCOMP/FCOMPP instructions. The only difference is that the FUCOM/FUCOMP/FUCOMPP instructions raise 
the invalid-arithmetic-operand exception (#IA) only when either or both operands are an SNaN or are in an unsup¬ 
ported format; QNaNs cause the condition code flags to be set to unordered, but do not cause an exception to be 
generated. The FCOM/FCOMP/FCOMPP instructions raise an invalid-operation exception when either or both of the 
operands are a NaN value of any kind or are in an unsupported format. 

As with the FCOM/FCOMP/FCOMPP instructions, if the operation results in an invalid-arithmetic-operand exception 
being raised, the condition code flags are set only if the exception is masked. 

The FUCOMP instruction pops the register stack following the comparison operation and the FUCOMPP instruction 
pops the register stack twice following the comparison operation. To pop the register stack, the processor marks 
the ST(0) register as empty and increments the stack pointer (TOP) by 1. 

This instruction's operation is the same in non-64-bit modes and 64-bit mode. 
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Operation 

CASE (relation of operands) OF 


ST > SRC: 

C3, C2, CO ^ 

-000; 

ST < SRC: 

C3, C2, CO ^ 

-001; 

ST = SRC: 

C3, C2, CO 4^ 

- 100; 


ESAC; 

IF ST(0) or SRC = QNaN, but not SNaN or unsupported format 
THEN 

C3, C2,C0^ 111; 

ELSE (* ST(0) or SRC Is SNaN or unsupported format *) 

#IA; 

IF FPUControlWord.lM = 1 
THEN 

C3, C2,C0^ 111; 

FI; 

FI; 

IF Instruction = FUCOMP 
THEN 

PopRegisterStack; 

FI; 

IF Instruction = FUCOMPP 
THEN 

PopRegisterStack; 

FI; 

FPU Flags Affected 

Cl Set to 0 if stack underflow occurred. 

CO, C2, C3 See Table 3-41. 

Floating-Point Exceptions 

#IS stack underflow occurred. 

#IA One or both operands are SNaN values or have unsupported formats. Detection of a QNaN 

value in and of itself does not raise an invalid-operand exception. 

#D One or both operands are denormal values. 

Protected Mode Exceptions 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

#MF If there is a pending x87 FPU exception. 

#UD If the LOCK prefix is used. 

Real-Address Mode Exceptions 

Same exceptions as in protected mode. 

Virtual-SOSe Mode Exceptions 

Same exceptions as in protected mode. 

Compatibility Mode Exceptions 

Same exceptions as in protected mode. 
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64-Bit Mode Exceptions 

Same exceptions as in protected mode. 
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FXAM—Examine Floating-Point 


Opcode 

Instruction 

64-Bit 

Mode 

Compat/ 

Leg Mode 

Description 

D9 E5 

FXAM 

Valid 

Valid 

Classify value or number in ST(0). 


Description 

Examines the contents of the ST(0) register and sets the condition code flags CO, C2, and C3 in the FPU status word 
to indicate the class of value or number in the register (see the table below). 


Table 3-42. FXAM Results 


Class 

C3 

C2 

CO 

Unsupported 

0 

0 

0 

NaN 

0 

0 

1 

Normal finite number 

0 

1 

0 

Infinity 

0 

1 

1 

Zero 

1 

0 

0 

Empty 

1 

0 

1 

Denormal number 

1 

1 

0 


The Cl flag is set to the sign of the value in ST(0), regardless of whether the register is empty or full. 
This instruction's operation is the same in non-64-bit modes and 64-bit mode. 

Operation 

Cl sign bit of ST; (* 0 for positive, 1 for negative *) 

CASE (class of value or number in ST(0)) OF 
Unsupported:C3, C2, CO 000; 


NaN: 

C3, C2, CO 

-001; 

Normal: 

C3, C2, CO 

-010; 

Infinity: 

C3, C2, CO 

-Oil; 

Zero: 

C3, C2, CO 

-100; 

Empty: 

C3, C2, CO 

-101; 

Denormal: 

C3, C2, CO 

-110; 

ESAC; 



FPU Flags Affected 



Cl Sign of value in ST(0). 

CO, C2, C3 See Table 3-42. 

Floating-Point Exceptions 

None 

Protected Mode Exceptions 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

#MF If there is a pending x87 FPU exception. 

#UD If the LOCK prefix is used. 

Real-Address Mode Exceptions 

Same exceptions as in protected mode. 
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\/irtual-8086 Mode Exceptions 

Same exceptions as in protected mode. 

Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

64-Bit Mode Exceptions 

Same exceptions as in protected mode. 
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FXCH—Exchange Register Contents 


Opcode 

Instruction 

64-Bit 

Mode 

Compat/ 

Leg Mode 

Description 

D9 C8+i 

FXCH ST(i) 

Valid 

Valid 

Exchange the contents of ST(0) and ST(i). 

D9 C9 

FXCH 

Valid 

Valid 

Exchange the contents of ST(0) and ST(1). 


Description 

Exchanges the contents of registers ST(0) and ST(i). If no source operand is specified, the contents of ST(0) and 
ST(1) are exchanged. 

This instruction provides a simple means of moving values in the FPU register stack to the top of the stack [ST(0)], 
so that they can be operated on by those floating-point instructions that can only operate on values in ST(0). For 
example, the following instruction sequence takes the square root of the third register from the top of the register 
stack: 

FXCH ST(3); 

FSQRT; 

FXCH ST(3); 

This instruction's operation is the same in non-64-bit modes and 64-bit mode. 

Operation 

IF (Number-of-operands) Is 1 
THEN 

temp ST(0); 

ST(0) ^ SRC; 

SRC <- temp; 

ELSE 

temp ST(0); 

ST(0)^ST(1); 

ST(1) temp; 

FI; 

FPU Flags Affected 

Cl Set to 0. 

CO, C2, C3 Undefined. 

Floating-Point Exceptions 

#IS stack underflow occurred. 

Protected Mode Exceptions 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

#MF If there is a pending x87 FPU exception. 

#UD If the LOCK prefix is used. 

Real-Address Mode Exceptions 

Same exceptions as in protected mode. 

Virtual-SOSe Mode Exceptions 

Same exceptions as in protected mode. 
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Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

64-Bit Mode Exceptions 

Same exceptions as in protected mode. 
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FXRSTOR-Restore x87 FPU, MMX, XMM, and MXCSR State 


Opcode/ 

Instruction 

Op/ 

En 

64-Bit 

Mode 

Compat/ 
Leg Mode 

Description 

OFAE/1 

M 

Valid 

Valid 

Restore the x87 FPU, MMX, XMM, and MXCSR 

FXRSTOR mSIZbyte 




register state from m51 Zbyte. 

REX.W+ OF AE n 

M 

Valid 

N.E. 

Restore the x87 FPU, MMX, XMM, and MXCSR 

FXRSTOR64 mSTZbyte 




register state from m51 Zbyte. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

M 

ModRM:r/m (r) 

NA 

NA 

NA 


Description 

Reloads the x87 FPU, MMX technology, XMM, and MXCSR registers from the 512-byte memory image specified in 
the source operand. This data should have been written to memory previously using the FXSAVE instruction, and 
in the same format as required by the operating modes. The first byte of the data should be located on a 16-byte 
boundary. There are three distinct layouts of the FXSAVE state map: one for legacy and compatibility mode, a 
second format for 64-bit mode FXSAVE/FXRSTOR with REX.W=0, and the third format is for 64-bit mode with 
FXSAVE64/FXRSTOR64. Table 3-43 shows the layout of the legacy/compatibility mode state information in memory 
and describes the fields in the memory image for the FXRSTOR and FXSAVE instructions. Table 3-46 shows the 
layout of the 64-bit mode state information when REX.W is set (FXSAVE64/FXRSTOR64). Table 3-47 shows the 
layout of the 64-bit mode state information when REX.W is clear (FXSAVE/FXRSTOR). 

The state image referenced with an FXRSTOR instruction must have been saved using an FXSAVE instruction or be 
in the same format as required by Table 3-43, Table 3-46, or Table 3-47. Referencing a state image saved with an 
FSAVE, FNSAVE instruction or incompatible field layout will result in an incorrect state restoration. 

The FXRSTOR instruction does not flush pending x87 FPU exceptions. To check and raise exceptions when loading 
x87 FPU state information with the FXRSTOR instruction, use an FWAIT instruction after the FXRSTOR instruction. 

If the OSFXSR bit in control register CR4 is not set, the FXRSTOR instruction may not restore the states of the XMM 
and MXCSR registers. This behavior is implementation dependent. 

If the MXCSR state contains an unmasked exception with a corresponding status flag also set, loading the register 
with the FXRSTOR instruction will not result in a SIMD floating-point error condition being generated. Only the next 
occurrence of this unmasked exception will result in the exception being generated. 

Bits 16 through 32 of the MXCSR register are defined as reserved and should be set to 0. Attempting to write a 1 in 
any of these bits from the saved state image will result in a general protection exception (#GP) being generated. 

Bytes 464:511 of an FXSAVE image are available for software use. FXRSTOR ignores the content of bytes 464:511 
in an FXSAVE state image. 

Operation 

IF 64-Bit Mode 
THEN 

(x87 FPU, MMX, XMM15-XMMO, MXCSR) 

ELSE 

(x87 FPU, MMX, XMM7-XMM0, MXCSR) 

FI; 

x87 FPU and SIMD Floating-Point Exceptions 

None. 


Load(SRC); 

Load(SRC); 
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Protected Mode Exceptions 

#GP(0) For an illegal memory operand effective address in the CS, DS, ES, FS or GS segments. 


#SS(0) 

#PF(fault-code) 

#NM 

If a memory operand is not aligned on a 16-byte boundary, regardless of segment. (See align¬ 
ment check exception [#AC] below.) 

For an attempt to set reserved bits in MXCSR. 

For an illegal address in the SS segment. 

For a page fault. 

If CR0.TS[bit 3] = 1. 

If CR0.EM[bit 2] = 1. 

#UD 

If CPUID.01H:EDX.FXSR[bit 24] = 0. 

If instruction is preceded by a LOCK prefix. 

#AC 

If this exception is disabled a general protection exception (#GP) is signaled if the memory 
operand is not aligned on a 16-byte boundary, as described above. If the alignment check 
exception (#AC) is enabled (and the CPL is 3), signaling of #AC is not guaranteed and may 
vary with implementation, as follows. In all implementations where #AC is not signaled, a 
general protection exception is signaled in its place. In addition, the width of the alignment 
check may also vary with implementation. For instance, for a given implementation, an align¬ 
ment check exception might be signaled for a 2-byte misalignment, whereas a general protec¬ 
tion exception might be signaled for all other misalignments (4-, 8-, or 16-byte 
misalignments). 

#UD 

If the LOCK prefix is used. 


Real-Address Mode Exceptions 

#GP If a memory operand is not aligned on a 16-byte boundary, regardless of segment. 


#NM 

If any part of the operand lies outside the effective address space from 0 to FFFFFI. 

For an attempt to set reserved bits in MXCSR. 

If CR0.TS[bit 3] = 1. 

If CR0.EM[bit 2] = 1. 

#UD 

If CPUID.01H:EDX.FXSR[bit 24] = 0. 

If the LOCK prefix is used. 


Virtual-SOSe Mode Exceptions 

Same exceptions as in real address mode. 
#PF(fault-code) For a page fault. 


#AC 

#UD 

For unaligned memory reference. 

If the LOCK prefix is used. 


Compatibility Mode Exceptions 

Same exceptions as in protected mode 


FXRSTOR-Restore x87 FPU, MMX, XMM, and MXCSR State 


Vol.2A 3-411 


INSTRUCTION SET REFERENCE, A-L 


64-Bit Mode Exceptions 

#SS(0) If a memory address referencing the SS segment is in a non-canonical form. 

#GP(0) If the memory address is in a non-canonical form. 

If memory operand is not aligned on a 16-byte boundary, regardless of segment. 

For an attempt to set reserved bits in MXCSR. 

#PF(fault-code) For a page fault. 

#NM If CR0.TS[bit 3] = 1. 

If CR0.EM[bit 2] = 1. 

#UD If CPUID.01H:EDX.FXSR[bit 24] = 0. 

If instruction is preceded by a LOCK prefix. 

#AC If this exception is disabled a general protection exception (#GP) is signaled if the memory 

operand is not aligned on a 16-byte boundary, as described above. If the alignment check 
exception (#AC) is enabled (and the CPL is 3), signaling of #AC is not guaranteed and may 
vary with implementation, as follows. In all implementations where #AC is not signaled, a 
general protection exception is signaled in its place. In addition, the width of the alignment 
check may also vary with implementation. For instance, for a given implementation, an align¬ 
ment check exception might be signaled for a 2-byte misalignment, whereas a general protec¬ 
tion exception might be signaled for all other misalignments (4-, 8-, or 16-byte 
misalignments). 
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FXSAVE-Save x87 FPU, MMX Technology, and SS6 State 


Opcode/ 

Instruction 

Op/ 

En 

64-Bit 

Mode 

Compat/ 
Leg Mode 

Description 

OF AE /O 

M 

Valid 

Valid 

Save the x87 FPU, MMX, XMM, and MXCSR 

FXSAVE mSlZbyte 




register state to mSIZbyte. 

REX.W+ OF AE /O 

M 

Valid 

N.E. 

Save the x87 FPU, MMX, XMM, and MXCSR 

FXSAVE64 mSlZbyte 




register state to m51 Zbyte. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

M 

ModRM:r/m (w) 

NA 

NA 

NA 


Description 

Saves the current state of the x87 FPU, MMX technology, XMM, and MXCSR registers to a 512-byte memory loca¬ 
tion specified in the destination operand. The content layout of the 512 byte region depends on whether the 
processor is operating in non-64-bit operating modes or 64-bit sub-mode of IA-32e mode. 

Bytes 464:511 are available to software use. The processor does not write to bytes 464:511 of an FXSAVE area. 

The operation of FXSAVE in non-64-bit modes is described first. 

Non-e4-Bit Mode Operation 


Table 3-43 shows the layout of the state information in memory when the processor is operating in legacy modes. 

Table 3-43. Non-64-bit-Mode Layout of FXSAVE and FXRSTOR 
Memory Region 


15 14 

13 1Z 

11 10 

g 8 

7 6 

5 

4 

3 Z 

1 0 


Rsvd 

FCS 

FIP[31:0] 

FOP 

Rsvd 

FTW 

FSW 

FCW 

0 

MXCSR_MASK 

MXCSR 

Rsrvd 

FDS 

FDP[31:0] 

16 

Reserved 

STO/MMO 

32 

Reserved 

ST1/MM1 

48 

Reserved 

ST2/MM2 

64 

Reserved 

ST3/MM3 

80 

Reserved 

ST4/MM4 

96 

Reserved 

ST5/MM5 

112 

Reserved 

ST6/MM6 

128 

Reserved 

ST7/MM7 

144 

XMMO 

160 

XMM1 

176 

XMM2 

192 

XMM3 

208 

XMM4 

224 

XMM5 

240 

XMM6 

256 

XMM7 

272 

Reserved 

288 


FXSAVE-Save x87 FPU, MMX Technology, and SSE State 


Vol.2A 3-413 













































INSTRUCTION SET REFERENCE, A-L 


Table 3-43. Non-64-bit-Mode Layout of FXSAVE and FXRSTOR 
Memory Region (Contd.) 


15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 


Reserved 

304 

Reserved 

320 

Reserved 

336 

Reserved 

352 

Reserved 

368 

Reserved 

384 

Reserved 

400 

Reserved 

416 

Reserved 

432 

Reserved 

448 

Available 

464 

Available 

480 

Available 

496 


The destination operand contains the first byte of the memory image, and it must be aligned on a 16-byte 
boundary. A misaligned destination operand will result in a general-protection (#GP) exception being generated (or 
in some cases, an alignment check exception [#AC]). 

The FXSAVE instruction is used when an operating system needs to perform a context switch or when an exception 
handler needs to save and examine the current state of the x87 FPU, MMX technology, and/or XMM and MXCSR 
registers. 

The fields in Table 3-43 are defined in Table 3-44. 


Table 3-44. Field Definitions 


Field 

Definition 

FCW 

x87 FPU Control Word (16 bits). See Figure 8-6 In the Inter 64 and IA-32 Architectures Software 
Developer's Manual, Volume 1, for the layout of the x87 FPU control word. 

FSW 

x87 FPU Status Word (16 bits). See Figure 8-4 in the Intel’ 64 and IA-32 Architectures Software 
Developer's Manual, Volume 1, for the layout of the x87 FPU status word. 

Abridged FTW 

x87 FPU Tag Word (8 bits). The tag information saved here is abridged, as described in the following 
paragraphs. 

FOP 

x87 FPU Opcode (16 bits). The lower 11 bits of this field contain the opcode, upper 5 bits are reserved. 
See Figure 8-8 in the Intel’ 64 and IA-32 Architectures Software Developer's Manual, Volume 1, for 
the layout of the x87 FPU opcode field. 

FIP 

x87 FPU Instruction Pointer Offset (64 bits). The contents of this field differ depending on the current 
addressing mode (32-bit, 16-bit, or 64-bit) of the processor when the FXSAVE instruction was 
executed: 

32-bit mode — 32-bit IP offset. 

16-bit mode — low 16 bits are IP offset; high 16 bits are reserved. 

64-bit mode with REX.W — 64-bit IP offset. 

64-bit mode without REX.W — 32-bit IP offset. 

See "x87 FPU Instruction and Operand (Data) Pointers" in Chapter 8 of the Intel’ 64 and IA-32 
Architectures Software Developer's Manual, Volume 1, for a description of the x87 FPU instruction 
pointer. 
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Table 3-44. Field Definitions (Contd.) 


Field 

Definition 

FCS 

x87 FPU Instruction Pointer Selector (16 bits). If CPUID.(EAX=07H,ECX=0H):EBX[bit 13] = 1, the 
processor deprecates FCS and FDS, and this field is saved as OOOOFI. 

FDP 

x87 FPU Instruction Operand (Data) Pointer Offset (64 bits). The contents of this field differ 
depending on the current addressing mode (32-bit, 16-bit, or 64-bit) of the processor when the 

FXSAVE instruction was executed: 

32-bit mode — 32-bit DP offset. 

16-bit mode — low 16 bits are DP offset; high 16 bits are reserved. 

64-bit mode with REX.W — 64-bit DP offset. 

64-bit mode without REX.W — 32-bit DP offset. 

See "x87 FPU Instruction and Operand (Data) Pointers" in Chapter 8 of the Inter 64 and IA-32 
Architectures Software Developer's Manual, Volume 1, for a description of the x87 FPU operand 
pointer. 

FDS 

x87 FPU Instruction Operand (Data) Pointer Selector (16 bits). If CPUID.(EAX=07H,ECX=0H):EBX[bit 

13] = 1, the processor deprecates FCS and FDS, and this field is saved as OOOOFI. 

MXCSR 

MXCSR Register State (32 bits). See Figure 10-3 in the Intel" 64 and IA-32 Architectures Software 
Developer's Manual, Volume 1, for the layout of the MXCSR register. If the OSFXSR bit in control 
register CR4 is not set, the FXSAVE instruction may not save this register. This behavior is 
implementation dependent. 

MXCSR_ 

MASK 

MXCSR_MASK (32 bits). This mask can be used to adjust values written to the MXCSR register, 
ensuring that reserved bits are set to 0. Set the mask bits and flags in MXCSR to the mode of 
operation desired for SSE and SSE2 SIMD floating-point instructions. See "Guidelines for Writing to the 
MXCSR Register" in Chapter 11 of the Intel" 64 and IA-32 Architectures Software Developer's Manual, 
Volume 1, for instructions for how to determine and use the MXCSR_MASK value. 

STO/MMO through 

ST7/MM7 

x87 FPU or MMX technology registers. These 80-bit fields contain the x87 FPU data registers or the 
MMX technology registers, depending on the state of the processor prior to the execution of the 
FXSAVE instruction. If the processor had been executing x87 FPU instruction prior to the FXSAVE 
instruction, the x87 FPU data registers are saved; if it had been executing MMX instructions (or SSE or 
SSE2 instructions that operated on the MMX technology registers), the MMX technology registers are 
saved. When the MMX technology registers are saved, the high 16 bits of the field are reserved. 

XMMO through XMM7 

XMM registers (128 bits per field). If the OSFXSR bit in control register CR4 is not set, the FXSAVE 
instruction may not save these registers. This behavior is implementation dependent. 


The FXSAVE instruction saves an abridged version of the x87 FPU tag word in the FTW field (unlike the FSAVE 
instruction, which saves the complete tag word). The tag information is saved in physical register order (RO 
through R7), rather than in top-of-stack (TOS) order. With the FXSAVE instruction, however, only a single bit (1 for 
valid or 0 for empty) is saved for each tag. For example, assume that the tag word is currently set as follows: 

R7 R6 R5 R4 R3 R2 R1 RO 

11 XX XX XX 11 11 11 11 

Flere, IIB indicates empty stack elements and "xx" indicates valid (OOB), zero (OIB), or special (lOB). 

For this example, the FXSAVE instruction saves only the following 8 bits of information: 

R7 R6 R5 R4 R3 R2 R1 RO 

0 1 1 1 0 0 0 0 

Flere, a 1 is saved for any valid, zero, or special tag, and a 0 is saved for any empty tag. 

The operation of the FXSAVE instruction differs from that of the FSAVE instruction, the as follows: 

• FXSAVE instruction does not check for pending unmasked floating-point exceptions. (The FXSAVE operation in 
this regard is similar to the operation of the FNSAVE instruction). 

• After the FXSAVE instruction has saved the state of the x87 FPU, MMX technology, XMM, and MXCSR registers, 
the processor retains the contents of the registers. Because of this behavior, the FXSAVE instruction cannot be 
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used by an application program to pass a "clean" x87 FPU state to a procedure, since it retains the current 
state. To clean the x87 FPU state, an application must explicitly execute an FINIT instruction after an FXSAVE 
instruction to reinitialize the x87 FPU state. 

• The format of the memory image saved with the FXSAVE instruction is the same regardless of the current 
addressing mode (32-bit or 16-bit) and operating mode (protected, real address, or system management). 
This behavior differs from the FSAVE instructions, where the memory image format is different depending on 
the addressing mode and operating mode. Because of the different image formats, the memory image saved 
with the FXSAVE instruction cannot be restored correctly with the FRSTOR instruction, and likewise the state 
saved with the FSAVE instruction cannot be restored correctly with the FXRSTOR instruction. 

The FSAVE format for FTW can be recreated from the FTW valid bits and the stored 80-bit FP data (assuming the 
stored data was not the contents of MMX technology registers) using Table 3-45. 


Table 3-45. Recreating FSAVE Format 


Exponent 
all 1's 

Exponent 
all O's 

Fraction 
all O's 

J and M 
bits 

FTW vaiid bit 

x87 FTW 

0 

0 

0 

Ox 

1 

Special 

10 

0 

0 

0 

1x 

1 

Valid 

00 

0 

0 

1 

00 

1 

Special 

10 

0 

0 

1 

10 

1 

Valid 

00 

0 

1 

0 

Ox 

1 

Special 

10 

0 

1 

0 

lx 

1 

Special 

10 

0 

1 

1 

00 

1 

Zero 

01 

0 

1 

1 

10 

1 

Special 

10 

1 

0 

0 

lx 

1 

Special 

10 

1 

0 

0 

lx 

1 

Special 

10 

1 

0 

1 

00 

1 

Special 

10 

1 

0 

1 

10 

1 

Special 

10 

For all legal combinations above. 

0 

Empty 

11 


The J-bit is defined to be the 1-bit binary integer to the left of the decimal place in the significand. The M-bit is 
defined to be the most significant bit of the fractional portion of the significand (i.e., the bit immediately to the right 
of the decimal place). 

When the M-bit is the most significant bit of the fractional portion of the significand, it must be 0 if the fraction is all 
O's. 

IA-32e Mode Operation 

In compatibility sub-mode of IA-32e mode, legacy SSE registers, XMMO through XMM7, are saved according to the 
legacy FXSAVE map. In 64-bit mode, all of the SSE registers, XMMO through XMM15, are saved. Additionally, there 
are two different layouts of the FXSAVE map in 64-bit mode, corresponding to FXSAVE64 (which requires 
REX.W=1) and FXSAVE (REX.W=0). In the FXSAVE64 map (Table 3-46), the FPU IP and FPU DP pointers are 64-bit 
wide. In the FXSAVE map for 64-bit mode (Table 3-47), the FPU IP and FPU DP pointers are 32-bits. 
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Table 3-46. Layout of the 64-bit-mode FXSA\/E64 Map 
(requires REX.W = 1) 


15 14 13 12 

11 10 

9 8 

7 6 

5 

4 

3 2 

1 0 


FIP 

FOP 

Reserved 

FTW 

FSW 

FCW 

0 

MXCSR_MASK 

MXCSR 

FDP 

16 

Reserved 

STO/MMO 

32 

Reserved 

ST1/MM1 

48 

Reserved 

ST2/MM2 

64 

Reserved 

ST3/MM3 

80 

Reserved 

ST4/MM4 

96 

Reserved 

ST5/MM5 

112 

Reserved 

ST6/MM6 

128 

Reserved 

ST7/MM7 

144 

XMMO 

160 

XMM1 

176 

XMM2 

192 

XMM3 

208 

XMM4 

224 

XMM5 

240 

XMM6 

256 

XMM7 

272 

XMM8 

288 

XMM9 

304 

XMM10 

320 

XMM11 

336 

XMM12 

352 

XMM13 

368 

XMM14 

384 

XMM15 

400 

Reserved 

416 

Reserved 

432 

Reserved 

448 

Available 

464 

Available 

480 

Available 

496 
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Table 3-47. Layout of the 64-bit-mode FXSAVE Map (REX.W = 0) 


15 14 

13 12 

11 10 

9 8 

7 6 

5 

4 

3 2 

1 0 


Reserved 

FCS 

FIP[31:0] 

FOP 

Reserved 

FTW 

FSW 

FCW 

0 

MXCSR_MASK 

MXCSR 

Reserved 

FDS 

FDP[31:0] 

16 

Reserved 

STO/MMO 

32 

Reserved 

ST1/MM1 

48 

Reserved 

ST2/MM2 

64 

Reserved 

ST3/MM3 

80 

Reserved 

ST4/MM4 

96 

Reserved 

ST5/MM5 

112 

Reserved 

ST6/MM6 

128 

Reserved 

ST7/MM7 

144 

XMMO 

160 

XMM1 

176 

XMM2 

192 

XMM3 

208 

XMM4 

224 

XMM5 

240 

XMM6 

256 

XMM7 

272 

XMM8 

288 

XMM9 

304 

XMM10 

320 

XMM11 

336 

XMM12 

352 

XMM13 

368 

XMM14 

384 

XMM15 

400 

Reserved 

416 

Reserved 

432 

Reserved 

448 

Available 

464 

Available 

480 

Available 

496 
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Operation 

IF 64-Blt Mode 
THEN 

IF REX.W = 1 
THEN 

DEST ^ Save64BitPromotedFxsave(x87 FPU, MMX, XMM15-XMMO, 

MXCSR); 

ELSE 

DEST ^ Save64BitDefaultFxsave(x87 FPU, MMX, XMM15-XMMO, MXCSR); 

FI; 

ELSE 

DEST ^ SaveLegacyFxsave(x87 FPU, MMX, XMM7-XMMO, MXCSR); 

FI; 

Protected Mode Exceptions 

#GP(0) For an illegal memory operand effective address in the CS, DS, ES, FS or GS segments. 

If a memory operand is not aligned on a 16-byte boundary, regardless of segment. (See the 
description of the alignment check exception [#AC] below.) 

#SS(0) For an illegal address in the SS segment. 

#PF(fault-code) For a page fault. 

If CR0.TS[bit 3] = 1. 

If CR0.EM[bit 2] = 1. 

If CPUID.01H:EDX.FXSR[bit 24] = 0. 

If the LOCK prefix is used. 

If this exception is disabled a general protection exception (#GP) is signaled if the memory 
operand is not aligned on a 16-byte boundary, as described above. If the alignment check 
exception (#AC) is enabled (and the CPL is 3), signaling of #AC is not guaranteed and may 
vary with implementation, as follows. In all implementations where #AC is not signaled, a 
general protection exception is signaled in its place. In addition, the width of the alignment 
check may also vary with implementation. For instance, for a given implementation, an align¬ 
ment check exception might be signaled for a 2-byte misalignment, whereas a general protec¬ 
tion exception might be signaled for all other misalignments (4-, 8-, or 16-byte 
misalignments). 

Real-Address Mode Exceptions 

#GP If a memory operand is not aligned on a 16-byte boundary, regardless of segment. 

If any part of the operand lies outside the effective address space from 0 to FFFFH. 

#NM If CR0.TS[bit 3] = 1. 

If CR0.EM[bit 2] = 1. 

#UD If CPUID.01H:EDX.FXSR[bit 24] = 0. 

If the LOCK prefix is used. 

\/irtual-8086 Mode Exceptions 

Same exceptions as in real address mode. 

#PF(fault-code) For a page fault. 

#AC For unaligned memory reference. 

#UD If the LOCK prefix is used. 

Compatibility Mode Exceptions 

Same exceptions as in protected mode. 


#NM 

#UD 

#UD 

#AC 
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64-Bit Mode Exceptions 

#SS(0) If a memory address referencing the SS segment is in a non-canonical form 


#GP(0) 

If the memory address is in a non-canonical form. 

If memory operand is not aligned on a 16-byte boundary, regardless of segment. 

#PF(fault-code) 

#NM 

For a page fault. 

If CR0.TS[bit 3] = 1. 

If CR0.EM[bit 2] = 1. 

#UD 

If CPUID.01H:EDX.FXSR[bit 24] = 0. 

If the LOCK prefix is used. 

#AC 

If this exception is disabled a general protection exception (#GP) is signaled if the memory 
operand is not aligned on a 16-byte boundary, as described above. If the alignment check 
exception (#AC) is enabled (and the GPL is 3), signaling of #AC is not guaranteed and may 
vary with implementation, as follows. In all implementations where #AC is not signaled, a 
general protection exception is signaled in its place. In addition, the width of the alignment 
check may also vary with implementation. For instance, for a given implementation, an align¬ 
ment check exception might be signaled for a 2-byte misalignment, whereas a general protec¬ 
tion exception might be signaled for all other misalignments (4-, 8-, or 16-byte 
misalignments). 


Implementation Note 

The order in which the processor signals general-protection (#GP) and page-fault (#PF) exceptions when they both 
occur on an instruction boundary is given in Table 5-2 in the Intel® 64 and IA-32 Architectures Software Devel¬ 
oper's Manual, Volume 3B. This order vary for FXSAVE for different processor implementations. 
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FXTRACT—Extract Exponent and Significand 


Opcode/ 

Instruction 

64-Bit 

Mode 

Compat/ 

Leg Mode 

Description 

D9 F4 

Valid 

Valid 

Separate value in ST(0) into exponent and significand, store 

FXTRACT 



exponent in ST(0), and push the significand onto the register 




stack. 


Description 

Separates the source value in the ST(0) register into its exponent and significand, stores the exponent in ST(0), 
and pushes the significand onto the register stack. Following this operation, the new top-of-stack register ST(0) 
contains the value of the original significand expressed as a floating-point value. The sign and significand of this 
value are the same as those found in the source operand, and the exponent is 3FFFFI (biased value for a true expo¬ 
nent of zero). The ST(1) register contains the value of the original operand's true (unbiased) exponent expressed 
as a floating-point value. (The operation performed by this instruction is a superset of the IEEE-recommended 
logb(x) function.) 

This instruction and the F2XM1 instruction are useful for performing power and range scaling operations. The 
FXTRACT instruction is also useful for converting numbers in double extended-precision floating-point format to 
decimal representations (e.g., for printing or displaying). 

If the floating-point zero-divide exception (#Z) is masked and the source operand is zero, an exponent value of - 
o= is stored in register ST(1) and 0 with the sign of the source operand is stored in register ST(0). 

This instruction's operation is the same in non-64-bit modes and 64-bit mode. 

Operation 

TEMP ^ Significand(ST(0)); 

ST(0) ^ Exponent(ST(0)); 

TOP^ TOP-1; 

ST(0) ^ TEMP; 

FPU Flags Affected 

Cl Set to 0 if stack underflow occurred; set to 1 if stack overflow occurred. 

CO, C2, C3 Undefined. 

Floating-Point Exceptions 

#IS stack underflow or overflow occurred. 

#IA Source operand is an SNaN value or unsupported format. 

#Z ST(0) operand is+0. 

#D Source operand is a denormal value. 

Protected Mode Exceptions 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

#MF If there is a pending x87 FPU exception. 

#UD If the LOCK prefix is used. 

Real-Address Mode Exceptions 

Same exceptions as in protected mode. 

Virtual-SOSe Mode Exceptions 

Same exceptions as in protected mode. 
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Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

64-Bit Mode Exceptions 

Same exceptions as in protected mode. 


3-422 Vol. 2A 


EXTRACT—Extract Exponent and Significand 


INSTRUCTION SET REFERENCE, A-L 


FYL2X—Compute y * log 2 X 


Opcode 

Instruction 

64-Bit 

Mode 

Compat/ 

Leg Mode 

Description 

D9F1 

FYL2X 

Valid 

Valid 

Replace ST(1) with (ST(1) * log2ST(0)) and pop the 
register stack. 


Description 

Computes (ST(1) * log 2 (ST(0))), stores the result in resister ST(1), and pops the FPU register stack. The source 
operand in ST(0) must be a non-zero positive number. 

The following table shows the results obtained when taking the log of various classes of numbers, assuming that 
neither overflow nor underflow occurs. 

Table 3-48. FYL2X Results 


ST(0) 



— OO 

-F 

±0 

-h0<-hf<-hi 

-H 1 

-hF>-h1 

+ OO 

NaN 

— oo 

* 

* 

+ OO 

+ OO 

* 

— OO 

— oo 

NaN 

-F 

★ 

★ 


-hF 

-0 

-F 

— oo 

NaN 

-0 

* 

* 

■k 

-hO 

-0 

-0 

k 

NaN 

-HO 

* 

* 

■k 

-0 

-hO 

-hO 

k 

NaN 

-HF 

★ 

★ 

kk 

-F 

-hO 

-hF 

+ OO 

NaN 

+ OO 

★ 

★ 

— OO 

— OO 

★ 

-|- OO 

+ oo 

NaN 

NaN 

NaN 

NaN 

NaN 

NaN 

NaN 

NaN 

NaN 

NaN 


NOTES: 

F Means finite floating-point value. 

* Indicates floating-point invalid-operation (#IA) exception. 
** Indicates floating-point zero-divide (#Z) exception. 


If the divide-by-zero exception is masked and register ST(0) contains +0, the instruction returns ~ with a sign that 
is the opposite of the sign of the source operand in register ST(1). 

The FYL2X instruction is designed with a built-in multiplication to optimize the calculation of logarithms with an 
arbitrary positive base (b): 

lOQbX ^ (log 2 b)-'' * log^X 

This instruction's operation is the same in non-64-bit modes and 64-bit mode. 

Operation 

ST(1)^ST(1)* log2ST(0); 

PopRegisterStack; 

FPU Flags Affected 

Cl Set to 0 if stack underflow occurred. 

Set if result was rounded up; cleared otherwise. 

CO, C2, C3 Undefined. 
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Floating-Point Exceptions 


#is 

#IA 


#Z 

#D 

#U 

#0 

#P 


Stack underflow occurred. 

Either operand is an SNaN or unsupported format. 

Source operand in register ST(0) is a negative finite value 
(not -0). 

Source operand in register ST(0) is +0. 

Source operand is a denormal value. 

Result is too small for destination format. 

Result is too large for destination format. 

Value cannot be represented exactly in destination format. 


Protected Mode Exceptions 

#NM CR0.EM[bit 2] or CR0.TS[bit 3] = 1. 

#MF If there is a pending x87 FPU exception. 

#UD If the LOCK prefix is used. 


Real-Address Mode Exceptions 

Same exceptions as in protected mode. 

Virtual-SOSe Mode Exceptions 

Same exceptions as in protected mode. 

Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

e4-Bit Mode Exceptions 

Same exceptions as in protected mode. 
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FYL2XP1 —Compute y * log 2 (x +1) 


Opcode 

Instruction 

64-Bit 

Mode 

Compat/ 

Leg Mode 

Description 

D9 F9 

FYL2XP1 

Valid 

Valid 

Replace ST(1) with ST(1) * log2(ST(0) + 1.0) and pop the 
register stack. 


Description 

Computes (ST(1) * log2(ST(0) + 1.0)), stores the result in register ST(1), and pops the FPU register stack. The 
source operand in ST(0) must be in the range: 

-(l-72/2))to(l-V2/2) 

The source operand in ST(1) can range from to +=. If the ST(0) operand is outside of its acceptable range, the 
result is undefined and software should not rely on an exception being generated. Under some circumstances 
exceptions may be generated when ST(0) is out of range, but this behavior is implementation specific and not 
guaranteed. 

The following table shows the results obtained when taking the log epsilon of various classes of numbers, assuming 
that underflow does not occur. 


Table 3-49. FYL2XP1 Results 



ST 

[0) 

ST(1) 


-(1 - ( 72/2 )) to -0 

-0 

-FO 

-fO to -f(1 - ( 72/2 )) 

NaN 

— OO 

+ 00 

* 

* 

— OO 

NaN 

- F 

-FF 

-FO 

-0 

- F 

NaN 

- 0 

-FO 

-FO 

-0 

- 0 

NaN 

-1-0 

- 0 

- 0 

-FO 

-FO 

NaN 

-FF 

- F 

- 0 

-FO 

-FF 

NaN 

+ 00 

— OO 

* 

* 

+ 00 

NaN 

NaN 

NaN 

NaN 

NaN 

NaN 

NaN 


NOTES: 


F Means finite floating-point value. 

* Indicates floating-point invalid-operation (#IA) exception. 

This instruction provides optimal accuracy for values of epsilon [the value in register ST(0)] that are close to 0. For 
small epsilon (e) values, more significant digits can be retained by using the FYL2XP1 instruction than by using 
(e-rl) as an argument to the FYL2X instruction. The (e-i-1) expression is commonly found in compound interest and 
annuity calculations. The result can be simply converted into a value in another logarithm base by including a scale 
factor in the ST(1) source operand. The following equation is used to calculate the scale factor for a particular loga¬ 
rithm base, where n is the logarithm base desired for the result of the FYL2XP1 instruction: 

scale factor <- log^ 2 

This instruction's operation is the same in non-64-bit modes and 64-bit mode. 

Operation 

ST(1) ^ ST(1) * log2(ST(0) -r 1.0); 

PopRegisterStack; 
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FPU Flags Affected 

Cl 


Set to 0 if stack underflow occurred. 

Set if result was rounded up; cleared otherwise. 
Undefined. 


CO, C2, C3 


Floating-Point Exceptions 


#is 

#IA 

#D 

#U 

#0 

#P 


Stack underflow occurred. 

Either operand is an SNaN value or unsupported format. 
Source operand is a denormal value. 

Result is too small for destination format. 

Result is too large for destination format. 

Value cannot be represented exactly in destination format. 


Protected Mode Exceptions 


#NM 

#MF 

#UD 


CRO.EM[bit 2] or CRO.TS[bit 3] = 1. 

If there is a pending x87 FPU exception. 
If the LOCK prefix is used. 


Real-Address Mode Exceptions 

Same exceptions as in protected mode. 

Virtual-SOSe Mode Exceptions 

Same exceptions as in protected mode. 

Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

e4-Bit Mode Exceptions 

Same exceptions as in protected mode. 
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HADDPD—Packed Double-FP Horizontal Add 


Opcode/ 

Instruction 

Op/ 

En 

64/32-bit 

Mode 

CPUID 

Feature 

Flag 

Description 

66 OF 7C /r 

FIADDPD xmm 1, xmm2/m 128 

RM 

V/V 

SSE3 

Horizontal add packed double-precision 
floating-point values from xmm2/m128 to 
xmmh 

VEX.NDS.128.66.0F.WIG7C /r 

VHADDPD xmm1,xmm2, xmm3/m128 

RVM 

v/v 

AVX 

Horizontal add packed double-precision 
floating-point values from xmm2 and 
xmm3/mem. 

VEX.NDS.256.66.0F.WIG 7C /r 

VHADDPD ymmi, ymm2, ymm3/m256 

RVM 

V/V 

AVX 

Horizontal add packed double-precision 
floating-point values from ymm2 and 
ymm3/mem. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RM 

ModRM:reg (r, w) 

ModRM:r/m (r) 

NA 

NA 

RVM 

ModRM:reg (w) 

VEX.vvvv (r) 

ModRM:r/m (r) 

NA 


Description 

Adds the double-precision floating-point values in the high and low quadwords of the destination operand and 
stores the result in the low quadword of the destination operand. 

Adds the double-precision floating-point values in the high and low quadwords of the source operand and stores the 
result in the high quadword of the destination operand. 

In 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15). 
See Figure 3-16 for FIADDPD; see Figure 3-17 for VFIADDPD. 



OM15993 


Figure 3-16. HADDPD—Packed Double-FP Horizontal Add 
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Figure 3-17. VHADDPD operation 

128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti¬ 
nation is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding 
VMM register destination are unmodified. 

VEX.128 encoded version: the first source operand is an XMM register or 128-bit memory location. The destination 
operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding VMM register destination are 
zeroed. 

VEX.256 encoded version: The first source operand is a VMM register. The second source operand can be a VMM 
register or a 256-bit memory location. The destination operand is a VMM register. 

Operation 

HADDPD (128-bit Legacy SSE version) 

DEST[63:0] ^ SRC1 [127:64] + SRC1 [63:0] 

DEST[127:64] ^ SRC2[127:64] + SRC2[63:0] 

DEST[VLMAX-1:128] (Unmodified) 

VHADDPD (VEX.128 encoded version) 

DEST[63:0] ^ SRC1 [127:64] + SRC1 [63:0] 

DEST[127:64] ^ SRC2[127:64] + SRC2[63:0] 

DEST[VLMAX-1:128]^0 

VHADDPD (VEX.256 encoded version) 

DEST[63:0] ^ SRC1 [127:64] + SRC1 [63:0] 

DEST[127:64] ^ SRC2[127:64] + SRC2[63:0] 

DEST[191:128] ^ SRC1 [255:192] + SRC1 [191:128] 

DEST[255:192] ^ SRC2[255:192] + SRC2[191:128] 

Intel C/C++ Compiler Intrinsic Equivalent 

VHADDPD: _m256d _mm256_hadd_pd (_m256d a, _m256d b); 

HADDPD: _m128d _mm_hadd_pd (_m128d a_ml 28d b); 

Exceptions 

When the source operand is a memory operand, the operand must be aligned on a 16-byte boundary or a general- 
protection exception (#GP) will be generated. 
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Numeric Exceptions 

Overflow, Underflow, Invalid, Precision, Denormal 

Other Exceptions 

See Exceptions Type 2. 
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HADDPS—Packed Single-FP Horizontal Add 


Opcode/ 

Instruction 

Op/ 

Gn 

64/32-bit 

Mode 

CPUID 

Feature 

Flag 

Description 

F2 0F7C/r 

FIADDPS xmm 1, xmm2/m 128 

RM 

V/V 

SSE3 

Florizontal add packed single-precision 
floating-point values from xmm2/ml28to 
xmmh 

VEX.NDS.128.F2.0F.WIG 7C /r 

VFIADDPS xmmi, xmm2, xmm3/m128 

RVM 

v/v 

AVX 

Florizontal add packed single-precision 
floating-point values from xmm2 and 
xmm3/mem. 

VEX.NDS.256.F2.0F.WIG 7C /r 

VFIADDPS ymmi, ymm2, ymm3/m256 

RVM 

V/V 

AVX 

Florizontal add packed single-precision 
floating-point values from ymm2 and 
ymm3/mem. 


Instruction Operand 

Encoding 

Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RM 

ModRM:reg (r, w) 

ModRM:r/m (r) 

NA 

NA 

RVM 

ModRM:reg (w) 

VEX.vvvv (r) 

ModRM:r/m (r) 

NA 


Description 

Adds the single-precision floating-point values in the first and second dwords of the destination operand and stores 
the result in the first dword of the destination operand. 

Adds single-precision floating-point values in the third and fourth dword of the destination operand and stores the 
result in the second dword of the destination operand. 

Adds single-precision floating-point values in the first and second dword of the source operand and stores the 
result in the third dword of the destination operand. 

Adds single-precision floating-point values in the third and fourth dword of the source operand and stores the result 
in the fourth dword of the destination operand. 

In 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15). 
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See Figure 3-18 for FIADDPS; see Figure 3-19 for VFIADDPS. 


FIADDPS xmmi, xmm2/m128 



xmm2/ 

m128 


xmm1 


RESULT: 

xmm1 


[127:96] [95:64] [63:32] [31:0] 


OM15994 


Figure 3-18. HADDPS—Packed Single-FP Horizontal Add 


SRC1 

X7 

X6 

X5 

X4 

X3 

X2 

XI 

XO 


\ \ 

\ 


\ \ 

\ 

1 

SRC2 

Y7 

Y6 

Y5 

Y4 

Y3 

Y2 

Y1 

YO 




Y6+Y7 

Y4-rY5 

X6+X7 

X4+X5 

Y2-t-Y3 

YO-rYI 

X2+X3 

X0+X1 


Figure 3-19. VHADDPS operation 


128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti¬ 
nation is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding 
VMM register destination are unmodified. 

VEX. 128 encoded version: the first source operand is an XMM register or 128-bit memory location. The destination 
operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding VMM register destination are 
zeroed. 

VEX.256 encoded version: The first source operand is a VMM register. The second source operand can be a VMM 
register or a 256-bit memory location. The destination operand is a VMM register. 
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Operation 

HADDPS (1 Z8-bit Legacy SSE version) 

DEST[31:0] ^ SRC1 [63:32] + SRC1 [31:0] 
DEST[63:32] ^ SRC1 [127:96] + SRC1 [95:64] 
DEST[95:64] ^ SRC2[63:32] + SRC2[31:0] 
DEST[127:96] ^ SRC2[127:96] + SRC2[95:64] 
DEST[VLMAX-1:128] (Unmodified) 


VHADDPS (VEX.128 encoded version) 

DEST[31:0] ^ SRC1 [63:32] + SRC1 [31:0] 
DEST[63:32] ^ SRC1 [127:96] + SRC1 [95:64] 
DEST[95:64] ^ SRC2[63:32] + SRC2[31:0] 
DEST[127:96] ^ SRC2[127:96] + SRC2[95:64] 
DEST[VLMAX-1:128]^0 


VHADDPS {VEX.256 encoded version) 

DEST[31:0] ^ SRC1 [63:32] + SRC1 [31:0] 

DEST[63:32] ^ SRC1 [127:96] + SRC1 [95:64] 

DEST[95:64] ^ SRC2[63:32] + SRC2[31:0] 

DEST[127:96] ^ SRC2[127:96] + SRC2[95:64] 

DEST[159:128] ^ SRC1 [191:160] + SRC1 [159:128] 

DEST[191:160] ^ SRC1 [255:224] + SRC1 [223:192] 

DEST[223:192] ^ SRC2[191:160] + SRC2[159:128] 

DEST[255:224] ^ SRC2[255:224] + SRC2[223:192] 

Intel C/C++ Compiler Intrinsic Equivalent 

HADDPS: _m128 _mm_hadd_ps (_m128 a, _m128 b); 

VHADDPS: _m256 _mm256_hadd_ps (_m256 a, _m256 b); 

Exceptions 

When the source operand is a memory operand, the operand must be aligned on a 16-byte boundary or a general- 
protection exception (#GP) will be generated. 

Numeric Exceptions 

Overflow, Underflow, Invalid, Precision, Denormal 

Other Exceptions 

See Exceptions Type 2. 
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HLT-Halt 


Opcode 

Instruction 

Op/ 

En 

64-Bit 

Mode 

Compat/ 
Leg Mode 

Description 

F4 

HLT 

NP 

Valid 

Valid 

Halt 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

NP 

NA 

NA 

NA 

NA 


Description 

Stops instruction execution and places the processor in a HALT state. An enabled interrupt (including NMI and 
SMI), a debug exception, the BINIT# signal, the INIT# signal, or the RESET# signal will resume execution. If an 
interrupt (including NMI) is used to resume execution after a HLT instruction, the saved instruction pointer 
(CS:EIP) points to the instruction following the HLT instruction. 

When a HLT instruction is executed on an Intel 64 orIA-32 processor supporting Intel Hyper-Threading Technology, 
only the logical processor that executes the instruction is halted. The other logical processors in the physical 
processor remain active, unless they are each individually halted by executing a HLT instruction. 

The HLT instruction is a privileged instruction. When the processor is running in protected or virtual-8086 mode, 
the privilege level of a program or procedure must be 0 to execute the HLT instruction. 

This instruction's operation is the same in non-64-bit modes and 64-bit mode. 

Operation 

Enter Halt state; 

Flags Affected 

None 

Protected Mode Exceptions 

#GP(0) If the current privilege level is not 0. 

#UD If the LOCK prefix is used. 

Real-Address Mode Exceptions 

None. 

Virtual-SOSe Mode Exceptions 

Same exceptions as in protected mode. 

Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

e4-Bit Mode Exceptions 

Same exceptions as in protected mode. 


HLT-Halt 


Vol.2A 3-433 















INSTRUCTION SET REFERENCE, A-L 


HSUBPD—Packed Double-FP Horizontal Subtract 


Opcode/ 

Instruction 

Op/ 

En 

64/32-bit 

Mode 

CPUID 

Feature 

Flag 

Description 

66 OF 7D /r 

HSUBPD xmm 1, xmm2/m 128 

RM 

V/V 

SSE3 

Horizontal subtract packed double-precision 
floating-point values from xmm2/m128to 
xmml. 

VEX.NDS.1 28.66.0F.WIG 7D /r 

VHSUBPD xmm1,xmm2, xmm3/m128 

RVM 

v/v 

AVX 

Horizontal subtract packed double-precision 
floating-point values from xmm2 and 
xmm3/mem. 

VEX.NDS.256.66.0F.WIC 7D /r 

VHSUBPD ymmi, ymm2, ymm3/m256 

RVM 

V/V 

AVX 

Horizontal subtract packed double-precision 
floating-point values from ymm2 and 
ymm3/mem. 


Instruction Operand 

Encoding 

Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RM 

ModRM:reg (r, w) 

ModRM:r/m (r) 

NA 

NA 

RVM 

ModRM:reg (w) 

VEX.vvvv (r) 

ModRM:r/m (r) 

NA 


Description 

The HSUBPD instruction subtracts horizontally the packed DP FP numbers of both operands. 

Subtracts the double-precision floating-point value in the high quadword of the destination operand from the low 
quadword of the destination operand and stores the result in the low quadword of the destination operand. 

Subtracts the double-precision floating-point value in the high quadword of the source operand from the low quad- 
word of the source operand and stores the result in the high quadword of the destination operand. 

In 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15). 

See Figure 3-20 for HSUBPD; see Figure 3-21 for VHSUBPD. 



OM15995 


Figure 3-20. HSUBPD—Packed Double-FP Horizontal Subtract 
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Figure 3-21. VHSUBPD operation 

128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti¬ 
nation is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding 
VMM register destination are unmodified. 

VEX. 128 encoded version: the first source operand is an XMM register or 128-bit memory location. The destination 
operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding VMM register destination are 
zeroed. 

VEX.256 encoded version: The first source operand is a VMM register. The second source operand can be a VMM 
register or a 256-bit memory location. The destination operand is a VMM register. 

Operation 

HSUBPD (128-bit Legacy SSE version) 

DEST[63:0] ^ SRC1 [63:0] - SRC1 [127:64] 

DEST[127:64] ^ SRC2[63:0] - SRC2[127:64] 

DEST[VLMAX-1:128] (Unmodified) 

VHSUBPD (VEX.128 encoded version) 

DEST[63:0] ^ SRC1 [63:0] - SRC1 [127:64] 

DEST[127:64] ^ SRC2[63:0] - SRC2[127:64] 

DEST[VLMAX-1:128]^0 

VHSUBPD (VEX.256 encoded version) 

DEST[63:0] ^ SRC1 [63:0] - SRC1 [127:64] 

DEST[127:64] ^ SRC2[63:0] - SRC2[127:64] 

DEST[191:128] ^ SRC1 [191:128] - SRC1 [255:192] 

DEST[255:192] ^ SRC2[191:128] - SRC2[255:192] 

Intel C/C-r-i- Compiler Intrinsic Equivalent 

HSUBPD: _m128d _mm_hsub_pd(_m128d a, _m128d b) 

VHSUBPD: _m256d _mm256_hsub_pd (_m256d a, _m256d b); 

Exceptions 

When the source operand is a memory operand, the operand must be aligned on a 16-byte boundary or a general- 
protection exception (#GP) will be generated. 

Numeric Exceptions 

Overflow, Underflow, Invalid, Precision, Denormal 
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Other Exceptions 

See Exceptions Type 2. 
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HSUBPS—Packed Single-FP Horizontal Subtract 


Opcode/ 

Instruction 

Op/ 

En 

64/32-bit 

Mode 

CPUID 

Feature 

Flag 

Description 

F2 0F7D/r 

HSUBPS xmm 1, xmm2/m 128 

RM 

V/V 

SSE3 

Horizontal subtract packed single-precision 
floating-point values from xmm2/ml28to 
xmmi. 

VEX.NDS.1 28.F2.0F.WIG 7D /r 

VHSUBPS xmmi, xmm2, xmm3/m128 

RVM 

v/v 

AVX 

Horizontal subtract packed single-precision 
floating-point values from xmm2 and 
xmm3/mem. 

VEX.NDS.256.F2.0F.WIC 7D /r 

VHSUBPS ymmi, ymm2, ymm3/m256 

RVM 

V/V 

AVX 

Horizontal subtract packed single-precision 
floating-point values from ymm2 and 
ymm3/mem. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RM 

ModRM:reg (r, w) 

ModRM:r/m (r) 

NA 

NA 

RVM 

ModRM:reg (w) 

VEX.vvvv (r) 

ModRM:r/m (r) 

NA 


Description 

Subtracts the single-precision floating-point value in the second dword of the destination operand from the first 
dword of the destination operand and stores the result in the first dword of the destination operand. 

Subtracts the single-precision floating-point value in the fourth dword of the destination operand from the third 
dword of the destination operand and stores the result in the second dword of the destination operand. 

Subtracts the single-precision floating-point value in the second dword of the source operand from the first dword 
of the source operand and stores the result in the third dword of the destination operand. 

Subtracts the single-precision floating-point value in the fourth dword of the source operand from the third dword 
of the source operand and stores the result in the fourth dword of the destination operand. 

In 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15). 

See Figure 3-22 for FISUBPS; see Figure 3-23 for VHSUBPS. 
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HSUBPS xmm1, xmm2/m128 



xmm2/ 

m128 


xmm1 


RESULT: 

xmm1 


[127:96] [95:64] [63:32] [31:0] 
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Figure 3-22. HSUBPS—Packed Single-FP Horizontal Subtract 
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X2-X3 
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Figure 3-23. VHSUBPS operation 

128-bit Legacy SSE version: The second source can be an XMM register or an 128-bit memory location. The desti¬ 
nation is not distinct from the first source XMM register and the upper bits (VLMAX-1:128) of the corresponding 
VMM register destination are unmodified. 

VEX.128 encoded version: the first source operand is an XMM register or 128-bit memory location. The destination 
operand is an XMM register. The upper bits (VLMAX-1:128) of the corresponding VMM register destination are 
zeroed. 

VEX.256 encoded version: The first source operand is a VMM register. The second source operand can be a VMM 
register or a 256-bit memory location. The destination operand is a VMM register. 
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Operation 

HSUBPS (128-bit Legacy SSE version) 

DEST[31:0] ^ SRC1 [31:0] - SRC1 [63:32] 
DEST[63:32] ^ SRC1 [95:64] - SRC1 [127:96] 
DEST[95:64] ^ SRC2[31:0] - SRC2[63:32] 
DEST[127:96] ^ SRC2[95:64] - SRC2[127:96] 
DEST[VLMAX-1:128] (Unmodified) 


VHSUBPS (VEX.128 encoded version) 

DEST[31:0] ^ SRC1 [31:0] - SRC1 [63:32] 
DEST[63:32] ^ SRC1 [95:64] - SRC1 [127:96] 
DEST[95:64] ^ SRC2[31:0] - SRC2[63:32] 
DEST[127:96] ^ SRC2[95:64] - SRC2[127:96] 
DEST[VLMAX-1:128]^0 


VHSUBPS (VEX.256 encoded version) 

DEST[31:0] ^ SRC1 [31:0] - SRC1 [63:32] 

DEST[63:32] ^ SRC1 [95:64] - SRC1 [127:96] 

DEST[95:64] ^ SRC2[31:0] - SRC2[63:32] 

DEST[127:96] ^ SRC2[95:64] - SRC2[127:96] 

DEST[159:128] ^ SRC1 [159:128] - SRC1 [191:160] 

DEST[191:160] ^ SRC1 [223:192] - SRC1 [255:224] 

DEST[223:192] ^ SRC2[159:128] - SRC2[191:160] 

DEST[255:224] ^ SRC2[223:192] - SRC2[255:224] 

Intel C/C++ Compiler Intrinsic Equivalent 

HSUBPS: _m128 _mm_hsub_ps(_m128 a, _m128 b); 

VHSUBPS: _m256 _mm256_hsub_ps (_m256 a, _m256 b); 

Exceptions 

When the source operand is a memory operand, the operand must be aligned on a 16-byte boundary or a general- 
protection exception (#GP) will be generated. 

Numeric Exceptions 

Overflow, Underflow, Invalid, Precision, Denormal 

Other Exceptions 

See Exceptions Type 2. 
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IDIV—Signed Divide 


Opcode 

Instruction 

Op/ 

En 

64-Bit 

Mode 

Compat/ 
Leg Mode 

Description 

F6 /7 

IDIV r/mS 

M 

Valid 

Valid 

Signed divide AX by r/m8, with result stored in: 
AL Quotient, AH <- Remainder. 

REX + F6 n 

IDIV r/mS* 

M 

Valid 

N.E. 

Signed divide AX by r/m8, with result stored in 
AL Quotient, AH <- Remainder. 

F7 n 

IDIV r/m 76 

M 

Valid 

Valid 

Signed divide DX:AX by r/m 76, with result 
stored in AX <- Quotient, DX <- Remainder. 

F7 n 

IDIV r/m32 

M 

Valid 

Valid 

Signed divide EDX:EAX by r/m32, with result 
stored in EAX <- Quotient, EDX <- Remainder. 

REX.W + F7 n 

IDIV r/m64 

M 

Valid 

N.E. 

Signed divide RDX:RAX by r/m64, with result 
stored in RAX Quotient, RDX <- Remainder. 


NOTES: 

* In 64-blt mode, r/m8 can not be encoded to access the following byte registers If a REX prefix is used: AH, BH, CH, DH. 


Instruction Operand Encoding 


Op/En 

Qperand 1 

Qperand 2 

Qperand 3 

Operand 4 

M 

ModRM:r/m (r) 

NA 

NA 

NA 


Description 

Divides the (signed) value in the AX, DX:AX, or EDX:EAX (dividend) by the source operand (divisor) and stores the 
result in the AX (AH:AL), DX:AX, or EDX:EAX registers. The source operand can be a general-purpose register or a 
memory location. The action of this instruction depends on the operand size (dividend/divisor). 

Non-integral results are truncated (chopped) towards 0. The remainder is always less than the divisor in magni¬ 
tude. Overflow is indicated with the #DE (divide error) exception rather than with the CF flag. 

In 64-bit mode, the instruction's default operation size is 32 bits. Use of the REX.R prefix permits access to addi¬ 
tional registers (R8-R15). Use of the REX.W prefix promotes operation to 64 bits. In 64-bit mode when REX.W is 
applied, the instruction divides the signed value in RDX:RAX by the source operand. RAX contains a 64-bit 
quotient; RDX contains a 64-bit remainder. 

See the summary chart at the beginning of this section for encoding data and limits. See Table 3-50. 


Table 3-50. IDIV Results 


Operand Size 

Dividend 

Divisor 

Quotient 

Remainder 

Quotient Range 

Word/byte 

AX 

r/m8 

AL 

AH 

-128 to-Hi 27 

Doubleword/word 

DX:AX 

r/ml 6 

AX 

DX 

-32,768 to -h32,767 

Quadword/doubleword 

EDX:EAX 

r/m32 

EAX 

EDX 

-231 ^0 2}i _ 1 

Doublequadword/ quadword 

RDX:RAX 

r/m64 

RAX 

RDX 

-263 .[0 263 _ 1 
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Operation 

IF SRC =0 

THEN #DE; (* Divide error *) 

FI; 

IF OperandSIze = 8 (* Word/byte operation *) 

THEN 

temp AX / SRC; (* Signed division *) 

IF (temp > 7FH) or (temp < 80H) 

(* If a positive result is greater than 7FH or a negative result is less than 80H *) 
THEN #DE; (* Divide error *) 

ELSE 

AL temp; 

AH ^ AX SignedModulus SRC; 

FI; 

ELSE IF OperandSIze = 16 (* Doubleword/word operation *) 

THEN 

temp ^ DX:AX / SRC; (* Signed division *) 

IF (temp > 7FFFH) or (temp < 8000H) 

(* If a positive result is greater than 7FFFH 
or a negative result Is less than 8000H *) 

THEN 

#DE; (* Divide error *) 

ELSE 

AX temp; 

DX ^ DX:AX SignedModulus SRC; 

FI; 

FI; 

ELSE IF OperandSIze = 32 (* Quadword/doubleword operation *) 
temp ^ EDX:EAX / SRC; (* Signed division *) 

IF (temp > 7FFFFFFFH) or (temp < 80000000H) 

(* If a positive result Is greater than 7FFFFFFFH 
or a negative result Is less than 80000000H *) 

THEN 

#DE; (* Divide error *) 

ELSE 

EAX ^ temp; 

EDX ^ EDXE:AX SignedModulus SRC; 

FI; 

FI; 

ELSE IF OperandSIze = 64 (* Doublequadword/quadword operation *) 
temp RDX:RAX / SRC; (* Signed division *) 

IF (temp > 7FFFFFFFFFFFFFFFH) or (temp < 8000000000000000H) 

(* If a positive result is greater than 7FFFFFFFFFFFFFFFH 
or a negative result is less than 8000000000000000H *) 

THEN 

#DE; (* Divide error *) 

ELSE 

RAX <- temp; 

RDX ^ RDE:RAX SignedModulus SRC; 

FI; 

FI; 

FI; 


IDIV—Signed Divide 
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Flags Affected 

The CF, OF, SF, ZF, AF, and PF flags are undefined. 

Protected Mode Exceptions 

#DE If the source operand (divisor) is 0 


#GP(0) 

The signed result (quotient) is too large for the destination. 

If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 
If the DS, ES, FS, or GS register is used to access memory and it contains a NULL segment 
selector. 

#SS(0) 

#PF(fault-code) 

#AC(0) 

If a memory operand effective address is outside the SS segment limit. 

If a page fault occurs. 

If alignment checking is enabled and an unaligned memory reference is made while the 
current privilege level is 3. 

#UD 

If the LOCK prefix is used. 


Real-Address Mode Exceptions 


#DE 

If the source operand (divisor) is 0. 

#GP 

#SS 

#UD 

The signed result (quotient) is too large for the destination. 

If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 
If a memory operand effective address is outside the SS segment limit. 

If the LOCK prefix is used. 


Virtual-SOSe Mode Exceptions 


#DE 

If the source operand (divisor) is 0. 

#GP(0) 

#SS(0) 

#PF(fault-code) 

#AC(0) 

#UD 

The signed result (quotient) is too large for the destination. 

If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 
If a memory operand effective address is outside the SS segment limit. 

If a page fault occurs. 

If alignment checking is enabled and an unaligned memory reference is made. 

If the LOCK prefix is used. 


Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

64-Bit Mode Exceptions 

#SS(0) If a memory address referencing the SS segment is in a non-canonical form 


#GP(0) 

#DE 

If the memory address is in a non-canonical form. 

If the source operand (divisor) is 0 

If the quotient is too large for the designated register. 

#PF(fault-code) 

#AC(0) 

If a page fault occurs. 

If alignment checking is enabled and an unaligned memory reference is made while the 
current privilege level is 3. 

#UD 

If the LOCK prefix is used. 
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IMUL—Signed Multiply 


Opcode 

Instruction 

Op/ 

En 

64-Bit 

Mode 

Compat/ 
Leg Mode 

Description 

F6 /5 

IMUL r/mS* 

M 

Valid 

Valid 

AX<- AL * r/m byte. 

F7 /5 

IMUL r/m 7 6 

M 

Valid 

Valid 

DX:AX ^ AX * r/m word. 

F7 /5 

IMUL r/m32 

M 

Valid 

Valid 

EDX:EAX ^ EAX * r/m32. 

REX.W + F7 /5 

IMUL r/m64 

M 

Valid 

N.E. 

RDX:RAX ^ RAX * r/m64. 

OF AF /r 

IMUL r76, r/m 7 6 

RM 

Valid 

Valid 

word register <- word register * r/ml 6. 

OF AF /r 

IMUL r32, r/m32 

RM 

Valid 

Valid 

doubleword register doubleword register * 
r/m32. 

REX.W + OF AF /r 

IMUL r64, r/m64 

RM 

Valid 

N.E. 

Quadword register <- Quadword register * 
r/m64. 

6B /r ib 

IMUL r76, r/m 76,/mmS 

RMI 

Valid 

Valid 

word register r/ml6* sign-extended 
immediate byte. 

6B /r ib 

IMUL r32, r/m32, immS 

RMI 

Valid 

Valid 

doubleword register <- r/m32 * sign- 
extended immediate byte. 

REX.W + 6B k ib 

IMUL r64, r/m64, immS 

RMI 

Valid 

N.E. 

Quadword register <- r/m64 * sign-extended 
immediate byte. 

69 /r iw 

IMUL r76, r/m 76,/mm 76 

RMI 

Valid 

Valid 

word register r/ml6* immediate word. 

69 /rid 

IMUL r32, r/m32, imm32 

RMI 

Valid 

Valid 

doubleword register <- r/m32 * immediate 
doubleword. 

REX.W+ 69 /rid 

IMUL r64, r/m64, imm32 

RMI 

Valid 

N.E. 

Quadword register <- r/m64 * immediate 
doubleword. 

NOTES: 

* In 64-bit mode, r/m8 can not be encoded to access the following byte registers if a REX prefix is used: AH, BH, CH, DH. 


Instruction Operand Encoding 


Qp/En 

Qperand 1 

Qperand 2 

Operand 3 

Operand 4 

M 

ModRM:r/m (r, w) 

NA 

NA 

NA 

RM 

ModRM:reg (r, w) 

ModRM:r/m (r) 

NA 

NA 

RMI 

ModRM:reg (r, w) 

ModRM:r/m (r) 

imm8/16/32 

NA 


Description 

Performs a signed multiplication of two operands. This instruction has three forms, depending on the number of 

operands. 

• One-operand form — This form is identical to that used by the MUL instruction. Here, the source operand (in 
a general-purpose register or memory location) is multiplied by the value in the AL, AX, EAX, or RAX register 
(depending on the operand size) and the product (twice the size of the input operand) is stored in the AX, 
DX:AX, EDX:EAX, or RDX:RAX registers, respectively. 

• Two-operand form — With this form the destination operand (the first operand) is multiplied by the source 
operand (second operand). The destination operand is a general-purpose register and the source operand is an 
immediate value, a general-purpose register, or a memory location. The intermediate product (twice the size of 
the input operand) is truncated and stored in the destination operand location. 

• Three-operand form — This form requires a destination operand (the first operand) and two source operands 
(the second and the third operands). Here, the first source operand (which can be a general-purpose register 
or a memory location) is multiplied by the second source operand (an immediate value). The intermediate 
product (twice the size of the first source operand) is truncated and stored in the destination operand (a 
general-purpose register). 


IMUL—Signed Multiply 
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When an immediate value is used as an operand, it is sign-extended to the length of the destination operand 
format. 

The CF and OF flags are set when the signed integer value of the intermediate product differs from the sign 
extended operand-size-truncated product, otherwise the CF and OF flags are cleared. 

The three forms of the IMUL instruction are similar in that the length of the product is calculated to twice the length 
of the operands. With the one-operand form, the product is stored exactly in the destination. With the two- and 
three- operand forms, however, the result is truncated to the length of the destination before it is stored in the 
destination register. Because of this truncation, the CF or OF flag should be tested to ensure that no significant bits 
are lost. 

The two- and three-operand forms may also be used with unsigned operands because the lower half of the product 
is the same regardless if the operands are signed or unsigned. The CF and OF flags, however, cannot be used to 
determine if the upper half of the result is non-zero. 

In 64-bit mode, the instruction's default operation size is 32 bits. Use of the REX.R prefix permits access to addi¬ 
tional registers (R8-R15). Use of the REX.W prefix promotes operation to 64 bits. Use of REX.W modifies the three 
forms of the instruction as follows. 

• One-operand form —The source operand (in a 64-bit general-purpose register or memory location) is 
multiplied by the value in the RAX register and the product is stored in the RDX:RAX registers. 

• Two-operand form — The source operand is promoted to 64 bits if it is a register or a memory location. The 
destination operand is promoted to 64 bits. 

• Three-operand form — The first source operand (either a register or a memory location) and destination 
operand are promoted to 64 bits. If the source operand is an immediate, it is sign extended to 64 bits. 

Operation 

IF (NumberOfOperands = 1) 

TFIEN IF (OperandSIze = 8) 

THEN 

TMP_XP AL * SRC (* Signed multiplication; TMP_XP is a signed integer at twice the width of the SRC *); 

AX ^ TMP_XP[15:0]; 

IF SignExtend(TMP_XP[7:0]) = TMP_XP 
THEN CF^O; OF^O; 

ELSE CF^ 1;0F^ 1;FI; 

ELSE IF OperandSIze = 16 
THEN 

TMP_XP <- AX * SRC (* Signed multiplication; TMP_XP is a signed integer at twice the width of the SRC *) 
DX:AX^TMP_XP[31:0]; 

IF SignExtend(TMP_XP[15:0]) = TMP_XP 
THEN CF ^ 0; OF ^ 0; 

ELSE CF^1;0F^1;FI; 

ELSE IF OperandSIze =32 
THEN 

TMP_XP <- EAX * SRC (* Signed multiplication; TMP_XP is a signed integer at twice the width of the SRC*) 
EDX:EAX ^ TMP_XP[63:0]; 

IF SignExtend(TMP_XP[31:0]) = TMP_XP 
THEN CF ^ 0; OF ^ 0; 

ELSE CF^1;0F^1;FI; 

ELSE (* OperandSIze = 64 *) 

TMP_XP <- RAX * SRC (* Signed multiplication; TMP_XP is a signed integer at twice the width of the SRC *) 
EDX:EAX ^ TMP_XP[127:0]; 

IF SignExtend(TMP_XP[63:0]) = TMP_XP 
THEN CF ^ 0; OF ^ 0; 

ELSE CF^1;0F^1;FI; 

FI; 
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FI; 

ELSE IF (NumberOfOperands = 2) 

THEN 

TMP_XP <- DEST * SRC (* Signed multiplication; TMP_XP is a signed integer at twice the width of the SRC *) 

DEST TruncateToOperandSize(TMP_XP); 

IF SignExtend(DEST) TMP_XP 
THEN CF^ 1;OF^ 1; 

ELSE CF ^ 0; OF ^ 0; FI; 

ELSE (* NumberOfOperands = 3 *) 

TMP_XP <- SRC1 * SRC2 (* Signed multiplication; TMP_XP is a signed integer at twice the width of the SRC1 *) 

DEST TruncateToOperandSize(TMP_XP); 

IF SignExtend(DEST) TMP_XP 
THEN CF^ 1;0F^ 1; 

ELSE CF ^ 0; OF ^ 0; FI; 

FI; 

FI; 

Flags Affected 

For the one operand form of the instruction, the CF and OF flags are set when significant bits are carried into the 
upper half of the result and cleared when the result fits exactly in the lower half of the result. For the two- and 
three-operand forms of the instruction, the CF and OF flags are set when the result must be truncated to fit in the 
destination operand size and cleared when the result fits exactly in the destination operand size. The SF, ZF, AF, and 
PF flags are undefined. 

Protected Mode Exceptions 

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

If the DS, ES, FS, or GS register is used to access memory and it contains a NULL NULL 
segment selector. 

#SS(0) If a memory operand effective address is outside the SS segment limit. 

#PF(fault-code) If a page fault occurs. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the 

current privilege level is 3. 

#UD If the LOCK prefix is used. 

Real-Address Mode Exceptions 

#GP If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

#SS If a memory operand effective address is outside the SS segment limit. 

#UD If the LOCK prefix is used. 

\/irtual-8086 Mode Exceptions 

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

#SS(0) If a memory operand effective address is outside the SS segment limit. 

#PF(fault-code) If a page fault occurs. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made. 

#UD If the LOCK prefix is used. 

Compatibility Mode Exceptions 

Same exceptions as in protected mode. 
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e4-Bit Mode Exceptions 

#SS(0) If a memory address referencing the SS segment is in a non-canonical form 


#GP(0) 

#PF(fault-code) 

#AC(0) 

If the memory address is in a non-canonical form. 

If a page fault occurs. 

If alignment checking is enabled and an unaligned memory reference is made while the 
current privilege level is 3. 

#UD 

If the LOCK prefix is used. 
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IN—Input from Port 


Opcode 

Instruction 

Op/ 

En 

64-Bit 

Mode 

Compat/ 
Leg Mode 

Description 

E4/b 

IN AL, immS 

1 

Valid 

Valid 

Input byte from /mmS I/O port address into 

AL. 

E5/b 

IN AX, immS 

1 

Valid 

Valid 

Input word from /mmS I/O port address into 

AX. 

E5/b 

IN EAX, immS 

1 

Valid 

Valid 

Input dword from /mmSI/0 port address into 
EAX. 

EC 

IN AL,DX 

NP 

Valid 

Valid 

Input byte from I/O port in DX into AL. 

ED 

IN AX,DX 

NP 

Valid 

Valid 

Input word from I/O port in DX into AX. 

ED 

IN EAX,DX 

NP 

Valid 

Valid 

Input doubleword from I/O port in DX into 

EAX. 


Instruction Operand 

Encoding 

Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

1 

imm8 

NA 

NA 

NA 

NP 

NA 

NA 

NA 

NA 


Description 

Copies the value from the I/O port specified with the second operand (source operand) to the destination operand 
(first operand). The source operand can be a byte-immediate or the DX register; the destination operand can be 
register AL, AX, or EAX, depending on the size of the port being accessed (8, 16, or 32 bits, respectively). Using the 
DX register as a source operand allows I/O port addresses from 0 to 65,535 to be accessed; using a byte imme¬ 
diate allows I/O port addresses 0 to 255 to be accessed. 

When accessing an 8-bit I/O port, the opcode determines the port size; when accessing a 16- and 32-bit I/O port, 
the operand-size attribute determines the port size. At the machine code level, I/O instructions are shorter when 
accessing 8-bit I/O ports. Here, the upper eight bits of the port address will be 0. 

This instruction is only useful for accessing I/O ports located in the processor's I/O address space. See Chapter 18, 
"Input/Output," in the Intel® 64 and IA-32 Architectures Software Developer's Manual, Volume 1, for more infor¬ 
mation on accessing I/O ports in the I/O address space. 

This instruction's operation is the same in non-64-bit modes and 64-bit mode. 

Operation 

IF ((PE = 1) and ((CPL > lOPL) or (VM = 1))) 

THEN (* Protected mode with CPL > lOPL or virtual-8086 mode *) 

IF (Any I/O Permission Bit for I/O port being accessed = 1) 

THEN (* I/O operation is not allowed *) 

#GP(0); 

ELSE (* I/O operation is allowed *) 

DEST SRC; (* Read from selected I/O port *) 

FI; 

ELSE (Real Mode or Protected Mode with CPL < lOPL *) 

DEST SRC; (* Read from selected I/O port *) 

FI; 

Flags Affected 

None 
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Protected Mode Exceptions 

#GP(0) If the CPL is greater than (has less privilege) the I/O privilege level (lOPL) and any of the 

corresponding I/O permission bits in TSS for the I/O port being accessed is 1. 

#UD If the LOCK prefix is used. 

Real-Address Mode Exceptions 

#UD If the LOCK prefix is used. 

Virtual-SOSe Mode Exceptions 

#GP(0) If any of the I/O permission bits in the TSS for the I/O port being accessed is 1. 

#PF(fault-code) If a page fault occurs. 

#UD If the LOCK prefix is used. 

Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

64-Bit Mode Exceptions 

#GP(0) If the CPL is greater than (has less privilege) the I/O privilege level (lOPL) and any of the 

corresponding I/O permission bits in TSS for the I/O port being accessed is 1. 

#UD If the LOCK prefix is used. 
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INC—Increment by 1 


Opcode 

Instruction 

Op/ 

En 

64-Bit 

Mode 

Compat/ 
Leg Mode 

Description 

FE/0 

INC r/mS 

M 

Valid 

Valid 

Increment r/m byte by 1. 

REX + FE /O 

INC r/m8 

M 

Valid 

N.E. 

Increment r/m byte by 1. 

FF/0 

INC r/m 7 6 

M 

Valid 

Valid 

Increment r/m word by 1. 

FF/0 

INC r/m32 

M 

Valid 

Valid 

Increment r/m doubleword by 1. 

REX.W + FF /O 

INC r/m64 

M 

Valid 

N.E. 

Increment r/m guadword by 1. 

40+ rw" 

INCr76 

0 

N.E. 

Valid 

Increment word register by 1. 

40+rd 

INC r32 

0 

N.E. 

Valid 

Increment doubleword register by 1. 


NOTES: 

* In 64-blt mode, r/m8 can not be encoded to access the following byte registers If a REX prefix is used: AH, BH, CH, DH. 
**40H through 47H are REX prefixes in 64-bit mode. 


Instruction Operand 

Encoding 

Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

M 

ModRM:r/m (r, w) 

NA 

NA 

NA 

0 

opcode + rd (r, w) 

NA 

NA 

NA 


Description 

Adds 1 to the destination operand, while preserving the state of the CF flag. The destination operand can be a 
register or a memory location. This instruction allows a loop counter to be updated without disturbing the CF flag. 
(Use a ADD instruction with an immediate operand of 1 to perform an increment operation that does updates the 
CFflag.) 

This instruction can be used with a LOCK prefix to allow the instruction to be executed atomically. 

In 64-bit mode, INC rl6 and INC r32 are not encodable (because opcodes 40H through 47H are REX prefixes). 
Otherwise, the instruction's 64-bit mode default operation size is 32 bits. Use of the REX.R prefix permits access to 
additional registers (R8-R15). Use of the REX.W prefix promotes operation to 64 bits. 

Operation 

BEST ^ BEST-Hi; 


AFIags Affected 

The CF flag is not affected. The OF, SF, ZF, AF, and PF flags are set according to the result. 


Protected Mode Exceptions 


#GP(0) 


#SS(0) 

#PF(fault-code) 

#AC(0) 

#UD 


If the destination operand is located in a non-writable segment. 

If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 
If the DS, ES, FS, or GS register is used to access memory and it contains a NULLsegment 
selector. 

If a memory operand effective address is outside the SS segment limit. 

If a page fault occurs. 

If alignment checking is enabled and an unaligned memory reference is made while the 
current privilege level is 3. 

If the LOCK prefix is used but the destination is not a memory operand. 
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Real-Address Mode 

#GP 

#SS 

#UD 


Exceptions 

If a memory operand effective address is outside the CS, DS, ES, FS, or 
If a memory operand effective address is outside the SS segment limit. 
If the LOCK prefix is used but the destination is not a memory operand. 


GS segment limit. 


Virtual-SOSe Mode 

#GP(0) 

#SS(0) 

#PF(fault-code) 

#AC(0) 

#UD 


Exceptions 

If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 
If a memory operand effective address is outside the SS segment limit. 

If a page fault occurs. 

If alignment checking is enabled and an unaligned memory reference is made. 

If the LOCK prefix is used but the destination is not a memory operand. 


Compatibility Mode Exceptions 

Same exceptions as in protected mode. 


e4-Bit Mode Exceptions 

#SS(0) If a memory address referencing the SS segment is in a non-canonical form. 

#GP(0) If the memory address is in a non-canonical form. 

#PF(fault-code) If a page fault occurs. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the 

current privilege level is 3. 

#UD If the LOCK prefix is used but the destination is not a memory operand. 
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INS/INSB/INSW/INSD-Input from Port to String 


Opcode 

Instruction 

Op/ 

En 

64-Bit 

Mode 

Compat/ 
Leg Mode 

Description 

6C 

INS mS, DX 

NP 

Valid 

Valid 

Input byte from I/O port specified in DX into 
memory location specified in ES:(E)DI or RDI.* 

6D 

INSm76,DX 

NP 

Valid 

Valid 

Input word from I/O port specified in DX into 
memory location specified in ES:(E)DI or RDI.^ 

6D 

INS m32, DX 

NP 

Valid 

Valid 

Input doubleword from I/O port specified in DX 
into memory location specified in ES:(E)DI or 
RDI.^ 

6C 

INSB 

NP 

Valid 

Valid 

Input byte from I/O port specified in DX into 
memory location specified with ES:(E)DI or 

RDI.^ 

6D 

INSW 

NP 

Valid 

Valid 

Input word from I/O port specified in DX into 
memory location specified in ES:(E)DI or RDI.^ 

6D 

INSD 

NP 

Valid 

Valid 

Input doubleword from I/O port specified in DX 
into memory location specified in ES:(E)DI or 
RDI.^ 


NOTES: 

* In 64-bit mode, only 64-blt (RDI) and 32-blt (EDI) address sizes are supported. In non-64-bit mode, only 32-blt (EDI) and 16-bit (Dl) 


address sizes are supported. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

NP 

NA 

NA 

NA 

NA 


Description 

Copies the data from the I/O port specified with the source operand (second operand) to the destination operand 
(first operand). The source operand is an I/O port address (from 0 to 65,535) that is read from the DX register. The 
destination operand is a memory location, the address of which is read from either the ES:DI, ES:EDI or the RDI 
registers (depending on the address-size attribute of the instruction, 16, 32 or 64, respectively). (The ES segment 
cannot be overridden with a segment override prefix.) The size of the I/O port being accessed (that is, the size of 
the source and destination operands) is determined by the opcode for an 8-bit I/O port or by the operand-size attri¬ 
bute of the instruction for a 16- or 32-bit I/O port. 

At the assembly-code level, two forms of this instruction are allowed: the "explicit-operands" form and the "no¬ 
operands" form. The explicit-operands form (specified with the INS mnemonic) allows the source and destination 
operands to be specified explicitly. Here, the source operand must be "DX," and the destination operand should be 
a symbol that indicates the size of the I/O port and the destination address. This explicit-operands form is provided 
to allow documentation; however, note that the documentation provided by this form can be misleading. That is, 
the destination operand symbol must specify the correct type (size) of the operand (byte, word, or doubleword), 
but it does not have to specify the correct location. The location is always specified by the ES:(E)DI registers, 
which must be loaded correctly before the INS instruction is executed. 

The no-operands form provides "short forms" of the byte, word, and doubleword versions of the INS instructions. 
Here also DX is assumed by the processor to be the source operand and ES:(E)DI is assumed to be the destination 
operand. The size of the I/O port is specified with the choice of mnemonic: INSB (byte), INSW (word), or INSD 
(doubleword). 

After the byte, word, or doubleword is transfer from the I/O port to the memory location, the DI/EDI/RDI register 
is incremented or decremented automatically according to the setting of the DF flag in the EFLAGS register. (If the 
DF flag is 0, the (E)DI register is incremented; if the DF flag is 1, the (E)DI register is decremented.) The (E)DI 
register is incremented or decremented by 1 for byte operations, by 2 for word operations, or by 4 for doubleword 
operations. 
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The INS, INSB, INSW, and INSD instructions can be preceded by the REP prefix for block input of ECX bytes, words, 
or doublewords. See "REP/REPE/REPZ /REPNE/REPNZ—Repeat String Operation Prefix" in Chapter 4 of the I ntel® 
64 and IA-32 Architectures Software Developer's Manual, Volume 2B, fora description of the REP prefix. 

These instructions are only useful for accessing I/O ports located in the processor's I/O address space. See Chapter 
18, "Input/Output," in the Intel® 64 and IA-32 Architectures Software Developer's Manual, Volume 1, for more 
information on accessing I/O ports in the I/O address space. 

In 64-bit mode, default address size is 64 bits, 32 bit address size is supported using the prefix 67H. The address 
of the memory destination is specified by RDI or EDI. 16-bit address size is not supported in 64-bit mode. The 
operand size is not promoted. 

These instructions may read from the I/O port without writing to the memory location if an exception or VM exit 
occurs due to the write (e.g. #PF). If this would be problematic, for example because the I/O port read has side- 
effects, software should ensure the write to the memory location does not cause an exception or VM exit. 

Operation 

IF ((PE = 1) and ((CPL > lOPL) or (VM = 1))) 

TFIEN (* Protected mode with CPL > lOPL or virtual-8086 mode *) 

IF (Any I/O Permission Bit for I/O port being accessed = 1) 

TFIEN (* I/O operation is not allowed *) 

#GP(0); 

ELSE (* I/O operation Is allowed *) 

DEST ^ SRC; (* Read from I/O port *) 

FI; 

ELSE (Real Mode or Protected Mode with CPL lOPL *) 

DEST ^ SRC; (* Read from I/O port *) 

FI; 

Non-64-blt Mode: 

IF (Byte transfer) 

THEN IF OF = 0 

THEN (E)DI ^ (E)DI H-1; 

ELSE (E)DI^(E)DI- 1;FI; 

ELSE IF (Word transfer) 

THEN IF DF = 0 

THEN (E)DI ^ (E)DI -r 2; 

ELSE (E)DI ^ (E)DI - 2; FI; 

ELSE (* Doubleword transfer *) 

THENIFDF = 0 

THEN (E)DI ^ (E)DI-r 4; 

ELSE (E)DI ^ (E)DI - 4; FI; 

FI; 

FI; 

FI64-blt Mode: 

IF (Byte transfer) 

THENIFDF = 0 

THEN (E|R)DI^(E|R)DI-h1; 

ELSE (E|R)DI^(E|R)DI- 1;FI; 

ELSE IF (Word transfer) 

THEN IF DF = 0 

THEN (E)DI ^ (E)DI -r 2; 

ELSE (E)DI ^ (E)DI - 2; FI; 

ELSE (* Doubleword transfer *) 
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THENIFDF = 0 

THEN (E|R)DI^(E|R)DI + 4; 

ELSE (E|R)DI ^ (E|R)DI - 4; FI; 

FI; 

FI; 

Flags Affected 

None 

Protected Mode Exceptions 

#GP(0) If the CPL is greater than (has less privilege) the I/O privilege level (lOPL) and any of the 

corresponding I/O permission bits in TSS for the I/O port being accessed is 1. 

If the destination is located in a non-writable segment. 

If an illegal memory operand effective address in the ES segments is given. 
#PF(fault-code) If a page fault occurs. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the 

current privilege level is 3. 

#UD If the LOCK prefix is used. 

Real-Address Mode Exceptions 

#GP If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

#SS If a memory operand effective address is outside the SS segment limit. 

#UD If the LOCK prefix is used. 

Virtual-SOSe Mode Exceptions 

#GP(0) If any of the I/O permission bits in the TSS for the 

#PF(fault-code) If a page fault occurs. 

#AC(0) If alignment checking is enabled and an unaligned 

#UD If the LOCK prefix is used. 

Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

e4-Bit Mode Exceptions 

#SS(0) If a memory address referencing the SS segment is in a non-canonical form. 

#GP(0) If the CPL is greater than (has less privilege) the I/O privilege level (lOPL) and any of the 

corresponding I/O permission bits in TSS for the I/O port being accessed is 1. 

If the memory address is in a non-canonical form. 

#PF(fault-code) If a page fault occurs. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the 

current privilege level is 3. 

#UD If the LOCK prefix is used. 


I/O port being accessed is 1. 
memory reference is made. 
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INSERTPS—Insert Scalar Single-Precision Floating-Point Value 


Opcode/ 

Instruction 

Op/ 

En 

64/32 
bit Mode 
Support 

CPUID 

Feature 

Flag 

Description 

66 OF 3A 21 /r lb 

INSERTPS xmmi, xmm2/m32, imm8 

RMI 

V/V 

SSE4_1 

Insert a single-precision floating-point value selected 
by imm8 from xmm2/m32 into xmmi at the specified 
destination element specified by imm8 and zero out 
destination elements in xmmi as indicated in imm8. 

VEX.NDS.128.66.0F3A.WIG 21 /r lb 
VINSERTPS xmm1,xmnn2, 
xmm3/m32, imm8 

RVMI 

v/v 

AVX 

Insert a single-precision floating-point value selected 
by imm8 from xmm3/m32 and merge with values in 
xmm2 at the specified destination element specified 
by imm8 and write out the result and zero out 
destination elements in xmmi as indicated in imm8. 

EVEX.NDS.128.66.0F3A.W0 21 /r lb 
VINSERTPS xmm1,xmm2, 
xmm3/m32, imm8 

T1S 

V/V 

AVX512F 

Insert a single-precision floating-point value selected 
by imm8 from xmm3/m32 and merge with values in 
xmm2 at the specified destination element specified 
by imm8 and write out the result and zero out 
destination elements in xmmi as indicated in imm8. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RMI 

ModRM:reg (r, w) 

ModRM:r/m (r) 

Imm8 

NA 

RVMI 

ModRM:reg (w) 

VEX.vvvv 

ModRM:r/m (r) 

Imm8 

T1S 

ModRM:reg (w) 

EVEX.vvvv 

ModRM:r/m (r) 

Imm8 


Description 

(register source form) 

Select a single-precision floating-point element from second source as indicated by Count_S bits of the immediate 
operand and destination operand it into the first source at the location indicated by the Count_D bits of the imme¬ 
diate operand. Store in the destination and zero out destination elements based on the ZMask bits of the immediate 
operand. 

(memory source form) 

Load a floating-point element from a 32-bit memory location and destination operand it into the first source at the 
location indicated by the Count_D bits of the immediate operand. Store in the destination and zero out destination 
elements based on the ZMask bits of the immediate operand. 

128-bit Legacy SSE version: The first source register is an XMM register. The second source operand is either an 
XMM register or a 32-bit memory location. The destination is not distinct from the first source XMM register and the 
upper bits (MAX_\/L-1:128) of the corresponding register destination are unmodified. 

VEX. 128 and EVEX encoded version: The destination and first source register is an XMM register. The second 
source operand is either an XMM register or a 32-bit memory location. The upper bits (MAX_VL-1:128) of the corre¬ 
sponding register destination are zeroed. 

If VINSERTPS is encoded with VEX.L= 1, an attempt to execute the instruction encoded with VEX.L= 1 will cause 
an #UD exception. 
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Operation 

VINSERTPS (VEX.128 and EVEX encoded version) 

IF (SRC = REG) THEN COUNT_S ^ Imm8[7:6] 

ELSE COUNT_S ^ 0 
COUNT_D ^ Imm8[5:4] 

ZMASK ^ imm8[3:0] 

CASE (COUNT_S) OF 
0: TMP^SRC2[31:0] 

1:TMP^SRC2[63:32] 

2: TMP ^ SRC2[95:64] 

3:TMP ^SRC2[127:96] 

ESAC; 

CASE(COUNT_D) OF 
0: TMP2[31:0]^TMP 

TMP2[127:32] ^SRCI [127:32] 
1:TMP2[63:32] ^TMP 

TMP2[31:0]^SRC1[31:0] 

TMP2[127:64] ^ SRC1 [127:64] 

2: TMP2[95:64] ^ TMP 

TMP2[63:0]^SRC1[63:0] 

TMP2[127:96] ^ SRC1 [127:96] 
3:TMP2[127:96]^TMP 

TMP2[95:0]^SRC1[95:0] 

ESAC; 

IF (ZMASK[0] = 1)THEN DEST[31:0] ^ OOOOOOOOH 
ELSE DEST[31:0] ^ TMP2[31:0] 

IF (ZMASK[1] = 1)THEN DEST[63:32] ^ OOOOOOOOH 
ELSE DEST[63:32] ^ TMP2[63:32] 

IF (ZMASK[2] = 1) THEN DEST[95:64] ^ OOOOOOOOH 
ELSE DEST[95:64] ^ TMP2[95:64] 

IF (ZMASK[3] = 1) THEN DEST[127:96] ^ OOOOOOOOH 
ELSE DEST[127:96] ^ TMP2[127:96] 
DEST[MAX_VL-1:128]^0 


INSERTPS (128-bit Legacy SSE version) 

IF (SRC = REG) THEN COUNT_S ^imm8[7:6] 
ELSE COUNT_S ^0 
COUNT_D ^Imm8[5:4] 

ZMASK ^Imm8[3:0] 

CASE (COUNT_S) OF 
0: TMP^SRC[31:0] 
1:TMP^SRC[63:32] 

2: TMP ^SRC[95:64] 
3:TMP^SRC[127:96] 

ESAC; 

CASE (COUNT_D) OF 
0: TMP2[31:0] ^TMP 

TMP2[127:32] ^DEST[127:32] 
1:TMP2[63:32] ^TMP 

TMP2[31:0] ^DEST[31:0] 

TMP2[127:64] ^DEST[127:64] 

2: TMP2[95:64] ^TMP 
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TMP2[63:0] ^DEST[63:0] 

TMP2[127:96] ^DEST[127:96] 

3:TMP2[127:96] ^TMP 

TMP2[95:0] ^DEST[95:0] 

ESAC; 

IF (ZMASK[0] = 1) THEN DEST[31:0] ^OOOOOOOOH 
ELSE DEST[31:0] ^TMP2[31:0] 

IF(ZMASK[1] = 1)THEN DEST[63:32] ^OOOOOOOOH 
ELSE DEST[63:32] ^TMP2[63:32] 

IF (ZMASK[2] = 1) THEN DEST[95:64] ^OOOOOOOOH 
ELSE DEST[95:64] ^TMP2[95:64] 

IF(ZMASK[3] = 1)THEN DEST[127:96] ^OOOOOOOOH 
ELSE DEST[127:96] ^TMP2[127:96] 

DEST[MAX_VL-1:128] (Unmodified) 

Intel C/C++ Compiler Intrinsic Equivalent 

VINSERTPS_ml 28 _mm_insert_ps(_ml 28 dst,_ml 28 src, const int nidx); 

INSETRTPS_ml 28 _mm_insert_ps(_ml 28 dst,_ml 28 src, const int nidx); 

SIMD Floating-Point Exceptions 

None 

Other Exceptions 

Non-EVEX-encoded instruction, see Exceptions Type 5; additionally 
#UD IfVEX.L=0. 

EVEX-encoded instruction, see Exceptions Type E9NF. 
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INT n/INTO/INT 3—Call to Interrupt Procedure 


Opcode 

Instruction 

Op/ 

En 

64-Bit 

Mode 

Compat/ 
Leg Mode 

Description 

CC 

INT 3 

NP 

Valid 

Valid 

Interrupt 3—trap to debugger. 

CD ib 

INT imm8 

1 

Valid 

Valid 

Interrupt vector specified by immediate byte. 

CE 

INTO 

NP 

Invalid 

Valid 

Interrupt 4—if overflow flag is 1. 


Instruction Operand 

Encoding 

Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

NP 

NA 

NA 

NA 

NA 

1 

immS 

NA 

NA 

NA 


Description 

The INT n instruction generates a call to the interrupt or exception handler specified with the destination operand 
(see the section titled "Interrupts and Exceptions" in Chapter 6 of the Intel® 64 and IA-32 Architectures Software 
Developer's Manual, Volume 1). The destination operand specifies a vector from 0 to 255, encoded as an 8-bit 
unsigned intermediate value. Each vector provides an index to a gate descriptor in the IDT. The first 32 vectors are 
reserved by Intel for system use. Some of these vectors are used for internally generated exceptions. 

The INT n instruction is the general mnemonic for executing a software-generated call to an interrupt handler. The 
INTO instruction is a special mnemonic for calling overflow exception (#OF), exception 4. The overflow interrupt 
checks the OF flag in the EFLAGS register and calls the overflow interrupt handler if the OF flag is set to 1. (The 
INTO instruction cannot be used in 64-bit mode.) 

The INT 3 instruction generates a special one byte opcode (CC) that is intended for calling the debug exception 
handler. (This one byte form is valuable because it can be used to replace the first byte of any instruction with a 
breakpoint, including other one byte instructions, without over-writing other code). To further support its function 
as a debug breakpoint, the interrupt generated with the CC opcode also differs from the regular software interrupts 
as follows: 

• Interrupt redirection does not happen when in VME mode; the interrupt is handled by a protected-mode 
handler. 

• The virtual-8086 mode lOPL checks do not occur. The interrupt is taken without faulting at any lOPL level. 

Note that the "normal" 2-byte opcode for INT 3 (CD03) does not have these special features. Intel and Microsoft 
assemblers will not generate the CD03 opcode from any mnemonic, but this opcode can be created by direct 
numeric code definition or by self-modifying code. 

The action of the INT n instruction (including the INTO and INT 3 instructions) is similar to that of a far call made 
with the CALL instruction. The primary difference is that with the INT n instruction, the EFLAGS register is pushed 
onto the stack before the return address. (The return address is a far address consisting of the current values of 
the CS and EIP registers.) Returns from interrupt procedures are handled with the IRET instruction, which pops the 
EFLAGS information and return address from the stack. 

The vector specifies an interrupt descriptor in the interrupt descriptor table (IDT); that is, it provides index into the 
IDT. The selected interrupt descriptor in turn contains a pointer to an interrupt or exception handler procedure. 
In protected mode, the IDT contains an array of 8-byte descriptors, each of which is an interrupt gate, trap gate, 
or task gate. In real-address mode, the IDT is an array of 4-byte far pointers (2-byte code segment selector and 
a 2-byte instruction pointer), each of which point directly to a procedure in the selected segment. (Note that in 
real-address mode, the IDT is called the interrupt vector table, and its pointers are called interrupt vectors.) 

The following decision table indicates which action in the lower portion of the table is taken given the conditions in 
the upper portion of the table. Each Y in the lower section of the decision table represents a procedure defined in 
the "Operation" section for this instruction (except #GP). 


INT n/INTO/INT 3—Call to Interrupt Procedure 


Vol.2A 3-457 





















INSTRUCTION SET REFERENCE, A-L 


Table 3-51. Decision Table 


PE 

0 

1 

1 

1 

1 

1 

1 

1 

VM 

- 

- 

- 

- 

- 

0 

1 

1 

lOPL 

- 

- 

- 

- 

- 

- 

<3 

=3 

DPL/CPL 

RELATIONSHIP 

- 

DPL< 

CPL 

- 

DPL> 

CPL 

DPL= 

CPL or C 

DPL< 

CPL&NC 

- 

- 

INTERRUPT TYPE 

- 

S/W 

- 

- 

- 

- 

- 

- 

GATE TYPE 

- 

- 

Task 

Trap or 
Interrupt 

Trap or 
Interrupt 

Trap or 
Interrupt 

Trap or 
Interrupt 

Trap or 
Interrupt 

REAL-ADDRESS-MODE 

Y 








PROTECTED-MODE 


Y 

Y 

Y 

Y 

Y 

Y 

Y 

TRAP-OR-INTERRUPT- 

GATE 




Y 

Y 

Y 

Y 

Y 

INTER-PRIVILEGE-LEVEL- 

INTERRUPT 






Y 



INTRA-PRIVILEGE-LEVEL- 

INTERRUPT 





Y 




INTERRUPT-FROM- 

VIRTUAL-8086-MODE 








Y 

TASK-GATE 



Y 






#GP 


Y 


Y 



Y 



NOTES: 

Don't Care. 

Y Yes, action taken. 


Blank Action not taken. 

When the processor is executing in virtual-8086 mode, the lOPL determines the action of the INT n instruction. If 
the lOPL is less than 3, the processor generates a #GP(selector) exception; if the lOPL is 3, the processor executes 
a protected mode interrupt to privilege level 0. The interrupt gate's DPL must be set to 3 and the target CPL of the 
interrupt handler procedure must be 0 to execute the protected mode interrupt to privilege level 0. 

The interrupt descriptor table register (IDTR) specifies the base linear address and limit of the IDT. The initial base 
address value of the IDTR after the processor is powered up or reset is 0. 

Operation 

The following operational description applies not only to the INT n and INTO instructions, but also to external inter¬ 
rupts, nonmaskable interrupts (NMIs), and exceptions. Some of these events push onto the stack an error code. 

The operational description specifies numerous checks whose failure may result in delivery of a nested exception. 
In these cases, the original event is not delivered. 

The operational description specifies the error code delivered by any nested exception. In some cases, the error 
code is specified with a pseudofunction error_code(num,idt,ext), where idt and ext are bit values. The pseudofunc¬ 
tion produces an error code as follows: (1) if idt is 0, the error code is (num & FCH) | ext; (2) if idt is 1, the error 
code is (num « 3) | 2 | ext. 

In many cases, the pseudofunction error_code is invoked with a pseudovariable EXT. The value of EXT depends on 
the nature of the event whose delivery encountered a nested exception: if that event is a software interrupt, EXT is 
0; otherwise, EXT is 1. 
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IFPE=0 

THEN 

GOTO REAL-ADDRESS-MODE; 

ELSE (* PE = 1 *) 

IF (VM = 1 and lOPL < 3 AND INT n) 

THEN 

#GP(0); (* Bit 0 of error code Is 0 because INT n *) 

ELSE (* Protected mode, IA-32e mode, or vlrtual-8086 mode Interrupt *) 

IF (IA32_EFER.LMA = 0) 

THEN (* Protected mode, or virtual-8086 mode interrupt *) 

GOTO PROTECTED-MODE; 

ELSE (* IA-32e mode Interrupt *) 

GOTO IA-32e-MODE; 

FI; 

FI; 

FI; 

REAL-ADDRESS-MODE: 

IF ((vector_number« 2) + 3) Is not within IDT limit 
THEN #GP; FI; 

IF stack not large enough for a 6-byte return information 
THEN #SS; FI; 

Push (EFLAGS[15:0]); 

IF <- 0; (* Clear interrupt flag *) 

TF 0; (* Clear trap flag *) 

AC ^ 0; (* Clear AC flag *) 

Push(CS); 

Push(IP); 

(* No error codes are pushed in real-address mode*) 

CS <- IDT(Descriptor (vector_number« 2), selector)); 

EIP <- IDT(Descriptor (vector_number« 2), offset)); (* 16 bit offset AND OOOOFFFFH *) 
END; 

PROTECTED-MODE: 

IF ((vector_number« 3) + 7) is not within IDT limits 
or selected IDT descriptor is not an interrupt-, trap-, or task-gate type 
THEN #GP(error_code(vector_number,1,EXT)); FI; 

(* idt operand to error_code set because vector is used *) 

IF software interrupt (* Generated by INT n, INT3, or INTO *) 

THEN 

IF gate DPL < CPL (* PE = 1, DPL < CPL, software interrupt *) 

THEN #GP(error_code(vector_number,1,0)); FI; 

(* idt operand to error_code set because vector is used *) 

(* ext operand to error_code is 0 because INT n, INT3, or INTO*) 

FI; 

IF gate not present 

THEN #NP(error_code(vector_number,1,EXT)); FI; 

(* idt operand to error_code set because vector is used *) 

IF task gate (* Specified in the selected interrupt table descriptor *) 

THEN GOTO TASK-GATE; 

ELSE GOTO TRAP-OR-INTERRUPT-GATE; (* PE = 1, trap/interrupt gate *) 

FI¬ 

END; 

IA-32e-MODE: 

IF INTO and CS.L = 1 (64-bit mode) 

THEN #UD; 
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FI; 

IF ((vector_number« 4) + 15) is not in IDT limits 
or selected IDT descriptor is not an interrupt-, or trap-gate type 
TFIEN #GP(error_code(vector_number,1 ,EXT)); 

(* idt operand to error_code set because vector is used *) 

FI; 

IF software interrupt (* Generated by INT n, INT 3, or INTO *) 

THEN 

IF gate DPI < CPL (* PE = 1, DPI < CPL, software interrupt *) 

THEN #GP(error_code(vector_number,1,0)); 

(* idt operand to error_code set because vector is used *) 

(* ext operand to error_code is 0 because INT n, INT3, or INTO*) 

FI; 

FI; 

IF gate not present 

THEN #NP(error_code(vector_number,1 ,EXT)); 

(* idt operand to error_code set because vector is used *) 

FI; 

GOTO TRAP-OR-INTERRUPT-GATE; (* Trap/interrupt gate *) 

END; 

TASK-GATE: (* PE = 1, task gate *) 

Read TSS selector in task gate (IDT descriptor); 

IF local/global bit is set to local or index not within GDT limits 
THEN #GP(error_code(TSS selector,0,EXT)); FI; 

(* idt operand to error_code is 0 because selector is used *) 

Access TSS descriptor in GDT; 

IF TSS descriptor specifies that the TSS is busy (low-order 5 bits set to 00001) 
THEN #GP(TSS selector,0,EXT)); FI; 

(* idt operand to error_code is 0 because selector is used *) 

IF TSS not present 

THEN #NP(TSS selector,0,EXT)); FI; 

(* idt operand to error_code is 0 because selector is used *) 
SWITCH-TASKS (with nesting) to TSS; 

IF interrupt caused by fault with error code 
THEN 

IF stack limit does not allow push of error code 
THEN #SS(EXT); FI; 

Push(error code); 

FI; 

IF EIP not within code segment limit 
THEN #CP(EXT); FI; 

END; 

TRAP-OR-INTERRUPT-GATE: 

Read new code-segment selector for trap or interrupt gate (IDT descriptor); 

IF new code-segment selector is NULL 

THEN #GP(EXT); FI; (* Error code contains NULL selector *) 

IF new code-segment selector is not within its descriptor table limits 
THEN #GP(error_code(new code-segment selector,0,EXT)); FI; 

(* idt operand to error_code is 0 because selector is used *) 

Read descriptor referenced by new code-segment selector; 

IF descriptor does not indicate a code segment or new code-segment DPL > CPL 
THEN #GP(error_code(new code-segment selector,0,EXT)); FI; 

(* idt operand to error_code is 0 because selector is used *) 

IF new code-segment descriptor is not present. 
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THEN #NP(error_code(new code-segment selector,0,EXT)); FI; 

(* idt operand to error_code is 0 because selector Is used *) 

IF new code segment is non-conforming with DPI < CPL 
THEN 

IF VM = 0 
THEN 

GOTO INTER-PRIVILEGE-LEVEL-INTERRUPT; 

(* PE = 1, VM = 0, interrupt or trap gate, nonconforming code segment, 

DPL < CPL *) 

ELSE (* VM = 1 *) 

IF new code-segment DPL ^ 0 

THEN #GP(error_code(new code-segment selector,0,EXT)); 

(* idt operand to error_code is 0 because selector is used *) 

GOTO INTERRUPT-FR0M-VIRTUAL-8086-M0DE; FI; 

(* PE = 1, Interrupt or trap gate, DPL < CPL, VM = 1 *) 

FI; 

ELSE (* PE = 1, interrupt or trap gate, DPL > CPL *) 

IFVM= 1 

THEN #GP(error_code(new code-segment selector,0,EXT)); 

(* idt operand to error_code is 0 because selector is used *) 

IF new code segment is conforming or new code-segment DPL = CPL 
THEN 

GOTO INTRA-PRIVILEGE-LEVEL-INTERRUPT; 

ELSE (* PE = 1, Interrupt or trap gate, nonconforming code segment, DPL > CPL *) 
#GP(error_code(new code-segment selector,0,EXT)); 

(* idt operand to error_code is 0 because selector is used *) 

FI; 

FI¬ 

END; 

INTER-PRIVILEGE-LEVEL-INTERRUPT: 

(* PE = 1, interrupt or trap gate, non-conforming code segment, DPL < CPL *) 

IF (IA32_EFER.LMA = 0) (* Not IA-32e mode *) 

THEN 

(* Identify stack-segment selector for new privilege level In current TSS *) 

IF current TSS Is 32-blt 
THEN 

TSSstackAddress <- (new code-segment DPL « 3) + 4; 

IF (TSSstackAddress + 5) > current TSS limit 

THEN #TS(error_code(current TSS selector,0,EXT)); FI; 

(* idt operand to error_code is 0 because selector is used *) 

NewSS <- 2 bytes loaded from (TSS base + TSSstackAddress + 4); 

NewESP <- 4 bytes loaded from (TSS base + TSSstackAddress); 

ELSE (* current TSS Is 16-blt *) 

TSSstackAddress <- (new code-segment DPL « 2) + 2 
IF (TSSstackAddress + 3) > current TSS limit 

THEN #TS(error_code(current TSS selector,0,EXT)); FI; 

(* idt operand to error_code is 0 because selector is used *) 

NewSS <- 2 bytes loaded from (TSS base + TSSstackAddress + 2); 

NewESP <- 2 bytes loaded from (TSS base + TSSstackAddress); 

FL¬ 
IP NewSS Is NULL 

THEN #TS(EXT); Fl; 

IF NewSS Index is not within its descriptor-table limits 
or NewSS RPL new code-segment DPL 
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THEN #TS(error_code(NewSS,0,EXT)); FI; 

(* Idt operand to error_code Is 0 because selector is used *) 

Read new stack-segment descriptor for NewSS In GOT or LOT; 

IF new stack-segment DPI new code-segment DPI 
or new stack-segment Type does not Indicate writable data segment 
THEN #TS(error_code(NewSS,0,EXT)); FI; 

(* Idt operand to error_code Is 0 because selector is used *) 

IF NewSS is not present 

THEN #SS(error_code(NewSS,0,EXT)); FI; 

(* Idt operand to error_code Is 0 because selector is used *) 

ELSE (* IA-32e mode *) 

IF IDT-gate 1ST = 0 

THEN TSSstackAddress (new code-segment DPL « 3) + 4; 

ELSE TSSstackAddress ^ (IDT gate 1ST « 3) + 28; 

FI; 

IF (TSSstackAddress + 7) > current TSS limit 

THEN #TS(error_code(current TSS selector,0,EXT); FI; 

(* Idt operand to error_code Is 0 because selector is used *) 

NewRSP 8 bytes loaded from (current TSS base + TSSstackAddress); 
NewSS new code-segment DPL; (* NULL selector with RPL = new CPL *) 
FI; 

IF IDT gate is 32-bit 
THEN 

IF new stack does not have room for 24 bytes (error code pushed) 
or 20 bytes (no error code pushed) 

THEN #SS(error_code(NewSS,0,EXT)); FI; 

(* idt operand to error_code is 0 because selector is used *) 

FI 

ELSE 

IF IDT gate is 16-bit 
THEN 

IF new stack does not have room for 12 bytes (error code pushed) 
or 10 bytes (no error code pushed); 

THEN #SS(error_code(NewSS,0,EXT)); FI; 

(* idt operand to error_code is 0 because selector Is used *) 
ELSE (* 64-blt IDT gate*) 

IF StackAddress is non-canonical 

THEN #SS(EXT); FI; (* Error code contains NULL selector *) 

FI; 

FI; 

IF (IA32_EFER.LMA = 0) (* Not IA-32e mode *) 

THEN 

IF Instruction pointer from IDT gate Is not within new code-segment limits 
THEN #CP(EXT); FI; (* Error code contains NULL selector *) 

ESP ^ NewESP; 

SS <- NewSS; (* Segment descriptor Information also loaded *) 

ELSE (* IA-32e mode *) 

IF instruction pointer from IDT gate contains a non-canonical address 
THEN #CP(EXT); FI; (* Error code contains NULL selector *) 

RSP ^ NewRSP & FFFFFFFFFFFFFFFOH; 

SS ^ NewSS; 

FI; 

IF IDT gate is 32-bit 
THEN 
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CS:EIP <- Gate(CS:EIP); (* Segment descriptor Information also loaded *) 

ELSE 

IF IDT gate 16-blt 
THEN 

CS:IP ^ Gate(CS:IP); 

(* Segment descriptor Information also loaded *) 

ELSE (* 64-blt IDT gate *) 

CS:RIP ^ Gate(CS:RIP); 

(* Segment descriptor information also loaded *) 

FI; 

FI; 

IF IDT gate is 32-bit 
THEN 

Push(far pointer to old stack); 

(* Old SS and ESP, 3 words padded to 4 *) 

Push(EFLAGS); 

Push(far pointer to return instruction); 

(* Old CS and EIP, 3 words padded to 4 *) 

Push(ErrorCode); (* If needed, 4 bytes *) 

ELSE 

IF IDT gate 16-bit 
THEN 

Push(far pointer to old stack); 

(* Old SS and SP, 2 words *) 

Push(EFLAGS(15-0]); 

Push(far pointer to return instruction); 

(* Old CS and IP, 2 words *) 

Push(ErrorCode); (* If needed, 2 bytes *) 

ELSE (* 64-bit IDT gate *) 

Push(far pointer to old stack); 

(* Old SS and SP, each an 8-byte push *) 

Push(RFLAGS); (* 8-byte push *) 

Push(far pointer to return instruction); 

(* Old CS and RIP, each an 8-byte push *) 
Push(ErrorCode); (* If needed, 8-bytes *) 

FI; 

FI; 

CPL <- new code-segment DPL; 

CS(RPL) ^ CPL; 

IF IDT gate is interrupt gate 

THEN IF <- 0 (* Interrupt flag set to 0, interrupts disabled *); FI; 

TF ^ 0; 

VM^O; 

RF ^ 0; 

NT^O; 

END; 

INTERRUPT-FR0M-VIRTUAL-8086-M0DE: 

(* Identify stack-segment selector for privilege level 0 In current TSS *) 

IF current TSS is 32-bit 
THEN 

IF TSS limit < 9 

THEN #TS(error_code(current TSS selector,0,EXT)); FI; 

(* idt operand to error_code is 0 because selector is used *) 

NewSS <- 2 bytes loaded from (current TSS base + 8); 


INT n/INTO/INT 3—Call to Interrupt Procedure 


Vol.2A 3-463 


INSTRUCTION SET REFERENCE, A-L 


NewESP 4 bytes loaded from (current TSS base + 4); 

ELSE (* current TSS is 16-bit *) 

IF TSS limit < 5 

THEN #TS(error_code(current TSS selector,0,EXT)); FI; 

(* Idt operand to error_code Is 0 because selector is used *) 

NewSS 2 bytes loaded from (current TSS base + 4); 

NewESP 2 bytes loaded from (current TSS base + 2); 

FI; 

IF NewSS is NULL 

THEN #TS(EXT); FI; (* Error code contains NULL selector *) 

IF NewSS Index is not within its descriptor table limits 
or NewSS RPL 0 

THEN #TS(error_code(NewSS,0,EXT)); FI; 

(* Idt operand to error_code is 0 because selector is used *) 

Read new stack-segment descriptor for NewSS in GOT or LOT; 

IF new stack-segment DPL 0 or stack segment does not Indicate writable data segment 
THEN #TS(error_code(NewSS,0,EXT)); FI; 

(* Idt operand to error_code is 0 because selector is used *) 

IF new stack segment not present 

THEN #SS(error_code(NewSS,0,EXT)); FI; 

(* idt operand to error_code is 0 because selector is used *) 

IF IDT gate is 32-bit 
THEN 

IF new stack does not have room for 40 bytes (error code pushed) 
or 36 bytes (no error code pushed) 

THEN #SS(error_code(NewSS,0,EXT)); FI; 

(* Idt operand to error_code Is 0 because selector is used *) 

ELSE (* IDT gate is 16-bit) 

IF new stack does not have room for 20 bytes (error code pushed) 
or 18 bytes (no error code pushed) 

THEN #SS(error_code(NewSS,0,EXT)); FI; 

(* Idt operand to error_code Is 0 because selector is used *) 

FI; 

IF instruction pointer from IDT gate is not within new code-segment limits 
THEN #CP(EXT); FI; (* Error code contains NULL selector *) 
tempEFLAGS ^ EFLAGS; 

VM ^ 0; 

TF^O; 

RF^O; 

NT ^ 0; 

IF service through interrupt gate 
THEN IF = 0; FI; 

TempSS ^ SS; 

TempESP ^ ESP; 

SS ^ NewSS; 

ESP ^ NewESP; 

(* Following pushes are 16 bits for 16-blt IDT gates and 32 bits for 32-bit IDT gates; 
Segment selector pushes In 32-blt mode are padded to two words *) 

Push(GS); 

Push(FS); 

Push(DS); 

Push(ES); 

Push(TempSS); 

Push(TempESP); 
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Push(TempEFIaqs); 

Push(CS); 

Push(EIP); 

GS 0; (* Segment registers made NULL, Invalid for use In protected mode *) 

FS^O; 

DS^O; 

ES^O; 

CS:IP Gate(CS); (* Segment descriptor Information also loaded *) 

IF OperandSize = 32 
THEN 

EIP <- Gate(lnstruction pointer); 

ELSE (* OperandSize Is 16 *) 

EIP <- Gate(instruction pointer) AND OOOOFFFFH; 

FI; 

(* Start execution of new routine in Protected Mode *) 

END; 

INTRA-PRIVILEGE-LEVEL-INTERRUPT: 

(* PE = 1, DPL = CPL or conforming segment *) 

IF IA32_EFER.LMA = 1 (* IA-32e mode *) 

IF IDT-descrIptor 1ST 0 
THEN 

TSSstackAddress <- (IDT-descrIptor 1ST « 3) + 28; 

IF (TSSstackAddress + 7) > TSS limit 

THEN #TS(error_code(current TSS selector,0,EXT)); FI; 

(* idt operand to error_code is 0 because selector is used *) 
NewRSP <- 8 bytes loaded from (current TSS base + TSSstackAddress); 
FI; 

IF 32-blt gate (* Implies IA32_EFER.LMA = 0 *) 

THEN 

IF current stack does not have room for 16 bytes (error code pushed) 
or 12 bytes (no error code pushed) 

THEN #SS(EXT); FI; (* Error code contains NULL selector *) 

ELSE IF 16-blt gate (* Implies IA32_EFER.LMA = 0 *) 

IF current stack does not have room for 8 bytes (error code pushed) 
or 6 bytes (no error code pushed) 

THEN #SS(EXT); FI; (* Error code contains NULL selector *) 

ELSE (* IA32_EFER.LMA = 1,64-bit gate*) 

IF NewRSP contains a non-canonical address 

THEN #SS(EXT); (* Error code contains NULL selector *) 

FI; 

FL¬ 
IP (IA32_EFER.LMA = 0) (* Not IA-32e mode *) 

THEN 

IF Instruction pointer from IDT gate Is not within new code-segment limit 
THEN #GP(EXT); FI; (* Error code contains NULL selector *) 

ELSE 

IF Instruction pointer from IDT gate contains a non-canonical address 
THEN #GP(EXT); FI; (* Error code contains NULL selector *) 

RSP ^ NewRSP & FFFFFFFFFFFFFFFOH; 

FL¬ 
IP IDT gate is 32-bit (* implies IA32_EFER.LMA = 0 *) 

THEN 

Push (EFLAGS); 

Push (far pointer to return instruction); (* 3 words padded to 4 *) 
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CS:EIP Gate(CS:EIP); (* Segment descriptor Information also loaded *) 

Push (ErrorCode); (* If any *) 

ELSE 

IF IDT gate Is 16-blt (* implies IA32_EFER.LMA = 0 *) 

THEN 

Push (FLAGS); 

Push (far pointer to return location); (* 2 words *) 

CS:IP ^ Gate(CS:IP); 

(* Segment descriptor information also loaded *) 

Push (ErrorCode); (* If any *) 

ELSE (* IA32_EFER.LMA = 1,64-bit gate*) 

Push(far pointer to old stack); 

(* Old SS and SP, each an 8-byte push *) 

Push(RFLAGS); (* 8-byte push *) 

Push(far pointer to return instruction); 

(* Old CS and RIP, each an 8-byte push *) 

Push(ErrorCode); (* If needed, 8 bytes *) 

CS:RIP ^ GATE(CS:RIP); 

(* Segment descriptor information also loaded *) 

FI; 

FI; 

CS(RPL) ^ CPL; 

IF IDT gate is interrupt gate 

THEN IF 0; FI; (* Interrupt flag set to 0; interrupts disabled *) 

TF^O; 

NT ^ 0; 

VM ^ 0; 

RF^O; 

END; 

Flags Affected 

The EFLAGS register is pushed onto the stack. The IF, TF, NT, AC, RF, and VM flags may be cleared, depending on 
the mode of operation of the processor when the INT instruction is executed (see the "Operation" section). If the 
interrupt uses a task gate, any flags may be set or cleared, controlled by the EFLAGS image in the new task's TSS. 

Protected Mode Exceptions 

#GP(error_code) If the instruction pointer in the IDT or in the interrupt-, trap-, or task gate is beyond the code 
segment limits. 

If the segment selector in the interrupt-, trap-, or task gate is NULL. 

If an interrupt-, trap-, or task gate, code segment, or TSS segment selector index is outside its 
descriptor table limits. 

If the vector selects a descriptor outside the IDT limits. 

If an IDT descriptor is not an interrupt-, trap-, or task-descriptor. 

If an interrupt is generated by the INT n, INT 3, or INTO instruction and the DPL of an inter¬ 
rupt-, trap-, or task-descriptor is less than the CPL. 

If the segment selector in an interrupt- or trap-gate does not point to a segment descriptor for 
a code segment. 

If the segment selector for a TSS has its local/global bit set for local. 

If a TSS segment descriptor specifies that the TSS is busy or not available. 
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#SS(error_code) 


#NP(error_code) 

#TS(error_code) 


#PF(fault-code) 

#UD 

#AC(EXT) 


If pushing the return address, flags, or error code onto the stack exceeds the bounds of the 
stack segment and no stack switch occurs. 

If the SS register is being loaded and the segment pointed to is marked not present. 

If pushing the return address, flags, error code, or stack segment pointer exceeds the bounds 
of the new stack segment when a stack switch occurs. 

If code segment, interrupt-, trap-, or task gate, or TSS is not present. 

If the RPL of the stack segment selector in the TSS is not equal to the DPL of the code segment 
being accessed by the interrupt or trap gate. 

If DPL of the stack segment descriptor pointed to by the stack segment selector in the TSS is 
not equal to the DPL of the code segment descriptor for the interrupt or trap gate. 

If the stack segment selector in the TSS is NULL. 

If the stack segment for the TSS is not a writable data segment. 

If segment-selector index for stack segment is outside descriptor table limits. 

If a page fault occurs. 

If the LOCK prefix is used. 

If alignment checking is enabled, the gate DPL is 3, and a stack push is unaligned. 


Real-Address Mode Exceptions 

#GP If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

If the interrupt vector number is outside the IDT limits. 

#SS If stack limit violation on push. 

If pushing the return address, flags, or error code onto the stack exceeds the bounds of the 
stack segment. 

#UD If the LOCK prefix is used. 


Virtual-SOSe Mode Exceptions 


#GP(error_code) 


#SS(error_code) 


#NP(error_code) 


(For INT n, INTO, or BOUND instruction) If the lOPL is less than 3 or the DPL of the interrupt- 
, trap-, or task-gate descriptor is not equal to 3. 

If the instruction pointer in the IDT or in the interrupt-, trap-, or task gate is beyond the code 
segment limits. 

If the segment selector in the interrupt-, trap-, or task gate is NULL. 

If a interrupt-, trap-, or task gate, code segment, or TSS segment selector index is outside its 
descriptor table limits. 

If the vector selects a descriptor outside the IDT limits. 

If an IDT descriptor is not an interrupt-, trap-, or task-descriptor. 

If an interrupt is generated by the INT n instruction and the DPL of an interrupt-, trap-, or 
task-descriptor is less than the CPL. 

If the segment selector in an interrupt- or trap-gate does not point to a segment descriptor for 
a code segment. 

If the segment selector for a TSS has its local/global bit set for local. 

If the SS register is being loaded and the segment pointed to is marked not present. 

If pushing the return address, flags, error code, stack segment pointer, or data segments 
exceeds the bounds of the stack segment. 

If code segment, interrupt-, trap-, or task gate, or TSS is not present. 
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#TS(error_code) 

If the RPL of the stack segment selector in the TSS is not equal to the DPL of the code segment 
being accessed by the interrupt or trap gate. 

If DPL of the stack segment descriptor for the TSS's stack segment is not equal to the DPL of 
the code segment descriptor for the interrupt or trap gate. 

If the stack segment selector in the TSS is NULL. 

If the stack segment for the TSS is not a writable data segment. 

If segment-selector index for stack segment is outside descriptor table limits. 

#PF(fault-code) 

#BP 

#OF 

#UD 

#AC(EXT) 

If a page fault occurs. 

If the INT 3 instruction is executed. 

If the INTO instruction is executed and the OF flag is set. 

If the LOCK prefix is used. 

If alignment checking is enabled, the gate DPL is 3, and a stack push is unaligned. 


Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

e4-Bit Mode Exceptions 

#GP(error_code) If the instruction pointer in the 64-bit interrupt gate or 64-bit trap gate is non-canonical 


#SS(error_code) 

If the segment selector in the 64-bit interrupt or trap gate is NULL. 

If the vector selects a descriptor outside the IDT limits. 

If the vector points to a gate which is in non-canonical space. 

If the vector points to a descriptor which is not a 64-bit interrupt gate or 64-bit trap gate. 

If the descriptor pointed to by the gate selector is outside the descriptor table limit. 

If the descriptor pointed to by the gate selector is in non-canonical space. 

If the descriptor pointed to by the gate selector is not a code segment. 

If the descriptor pointed to by the gate selector doesn't have the L-bit set, or has both the L- 
bit and D-bit set. 

If the descriptor pointed to by the gate selector has DPL > CPL. 

If a push of the old EFLAGS, CS selector, EIP, or error code is in non-canonical space with no 
stack switch. 

#NP(error_code) 

#TS(error_code) 

If a push of the old SS selector, ESP, EFLAGS, CS selector, EIP, or error code is in non-canonical 
space on a stack switch (either CPL change or no-CPL with 1ST). 

If the 64-bit interrupt-gate, 64-bit trap-gate, or code segment is not present. 

If an attempt to load RSP from the TSS causes an access to non-canonical space. 

If the RSP from the TSS is outside descriptor table limits. 

#PF(fault-code) 

#UD 

#AC(EXT) 

If a page fault occurs. 

If the LOCK prefix is used. 

If alignment checking is enabled, the gate DPL is 3, and a stack push is unaligned. 
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INVD—Invalidate Internal Caches 


Opcode 

Instruction 

Op/ 

En 

64-Bit 

Mode 

Compat/ 
Leg Mode 

Description 

OF 08 

INVD 

NP 

Valid 

Valid 

Flush internal caches; initiate flushing of 
external caches. 


NOTES: 

* See the IA-32 Architecture Compatibility section below. 


Instruction Operand 

Encoding 

Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

NP 

NA 

NA 

NA 

NA 


Description 

Invalidates (flushes) the processor's internal caches and issues a special-function bus cycle that directs external 
caches to also flush themselves. Data held in internal caches is not written back to main memory. 

After executing this instruction, the processor does not wait for the external caches to complete their flushing oper¬ 
ation before proceeding with instruction execution. It is the responsibility of hardware to respond to the cache flush 
signal. 

The INVD instruction is a privileged instruction. When the processor is running in protected mode, the CPL of a 
program or procedure must be 0 to execute this instruction. 

The INVD instruction may be used when the cache is used as temporary memory and the cache contents need to 
be invalidated rather than written back to memory. When the cache is used as temporary memory, no external 
device should be actively writing data to main memory. 

Use this instruction with care. Data cached internally and not written back to main memory will be lost. Note that 
any data from an external device to main memory (for example, via a PCIWrite) can be temporarily stored in the 
caches; these data can be lost when an INVD instruction is executed. Unless there is a specific requirement or 
benefit to flushing caches without writing back modified cache lines (for example, temporary memory, testing, or 
fault recovery where cache coherency with main memory is not a concern), software should instead use the 
WBINVD instruction. 

This instruction's operation is the same in non-64-bit modes and 64-bit mode. 

IA-32 Architecture Compatibility 

The INVD instruction is implementation dependent; it may be implemented differently on different families of Intel 
64 or IA-32 processors. This instruction is not supported on IA-32 processors earlier than the Intel486 processor. 

Operation 

Flush(lnternalCaches); 

SignalFlush(ExternalCaches); 

Continue (* Continue execution *) 

Flags Affected 

None 

Protected Mode Exceptions 

#GP(0) If the current privilege level is not 0. 

#UD If the LOCK prefix is used. 

Real-Address Mode Exceptions 

#UD If the LOCK prefix is used. 


INVD—Invalidate Internal Caches 


Vol.2A 3-469 


















INSTRUCTION SET REFERENCE, A-L 


Virtual-SOSe Mode Exceptions 

#GP(0) The INVD instruction cannot be executed in virtual-8086 mode. 

Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

e4-Bit Mode Exceptions 

Same exceptions as in protected mode. 
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INVLPG—Invalidate TLB Entries 


Opcode 

Instruction 

Op/ 

En 

64-Bit 

Mode 

Compat/ 
Leg Mode 

Description 

OF 01/7 

INVLPG m 

M 

Valid 

Valid 

Invalidate TLB entries for page containing m. 


NOTES: 

* See the IA-32 Architecture Compatibility section below. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

M 

ModRM:r/m (r) 

NA 

NA 

NA 


Description 

Invalidates any translation lookaside buffer (TLB) entries specified with the source operand. The source operand is 
a memory address. The processor determines the page that contains that address and flushes all TLB entries for 
that page.^ 

The INVLPG instruction is a privileged instruction. When the processor is running in protected mode, the CPL must 
be 0 to execute this instruction. 

The INVLPG instruction normally flushes TLB entries only for the specified page; however, in some cases, it may 
flush more entries, even the entire TLB. The instruction is guaranteed to invalidates only TLB entries associated 
with the current PCID. (If PCIDs are disabled — CR4.PCIDE = 0 — the current PCID is OOOH.) The instruction also 
invalidates any global TLB entries for the specified page, regardless of PCID. 

For more details on operations that flush the TLB, see "MOV—Move to/from Control Registers" in the I ntel® 64 and 
IA-32 Architectures Software Developer's Manual, Volume 2B and Section 4.10.4.1, "Operations that Invalidate 
TLBs and Paging-Structure Caches," in the Intel® 64 and IA-32 Architectures Software Developer's Manual, 
Volume 3A. 

This instruction's operation is the same in all non-64-bit modes. It also operates the same in 64-bit mode, except 
if the memory address is in non-canonical form. In this case, INVLPG is the same as a NOP. 

IA-32 Architecture Compatibility 

The INVLPG instruction is implementation dependent, and its function may be implemented differently on different 
families of Intel 64 or IA-32 processors. This instruction is not supported on IA-32 processors earlier than the 
Intel486 processor. 

Operation 

Invalidate(RelevantTLBEntries); 

Continue; (* Continue execution *) 

Flags Affected 

None. 

Protected Mode Exceptions 

#GP(0) If the current privilege level is not 0. 

#UD Operand is a register. 

If the LOCK prefix is used. 


1. If the paging structures map the linear address using a page larger than 4 KBytes and there are multiple TLB entries for that page 
(see Section 4.10.2.3, "Details of TLB Use," in the Inter 64 and IA-32 Architectures Software Developer's Manual, Volume 3A), the 
instruction invalidates all of them. 
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Real-Address Mode Exceptions 

#UD Operand is a register. 

If the LOCK prefix is used. 

Virtual-SOSe Mode Exceptions 

#GP(0) The INVLPG instruction cannot be executed at the virtual-8086 mode. 

e4-Bit Mode Exceptions 

#GP(0) If the current privilege level is not 0. 

#UD Operand is a register. 

If the LOCK prefix is used. 
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INVPCID—Invalidate Process-Context Identifier 


Opcode/Instruction 

Op/ 

En 

64/32- 

bit 

Mode 

CPUID 

Feature 

Flag 

Description 

66 OF 38 82 /r 

INVPCID r32, ml 28 

RM 

NE/V 

INVPCID 

Invalidates entries in the TLBs and paging-structure 
caches based on invalidation type in r32 and descrip¬ 
tor in ml 28. 

66 OF 38 82 /r 

INVPCID r64, ml 28 

RM 

V/NE 

INVPCID 

Invalidates entries in the TLBs and paging-structure 
caches based on invalidation type in r64 and descrip¬ 
tor in ml 28. 


Instruction Operand 

Encoding 

Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RM 

ModRM:reg (R) 

ModRM:r/m (R) 

NA 

NA 


Description 

Invalidates mappings in the translation lookaside buffers (TLBs) and paging-structure caches based on process- 
context identifier (PCID). (See Section 4.10, "Caching Translation Information," in Intel 64 and IA-32 Architecture 
Software Developer's Manual, Volume 3A.) Invalidation is based on the INVPCID type specified in the register 
operand and the INVPCID descriptor specified in the memory operand. 

Outside 64-bit mode, the register operand is always 32 bits, regardless of the value of CS.D. In 64-bit mode the 
register operand has 64 bits. 

There are four INVPCID types currently defined: 

• Individual-address invalidation: If the INVPCID type is 0, the logical processor invalidates mappings—except 
global translations—for the linear address and PCID specified in the INVPCID descriptor.^ In some cases, the 
instruction may invalidate global translations or mappings for other linear addresses (or other PCIDs) as well. 

• Single-context invalidation: If the INVPCID type is 1, the logical processor invalidates all mappings—except 
global translations—associated with the PCID specified in the INVPCID descriptor. In some cases, the 
instruction may invalidate global translations or mappings for other PCIDs as well. 

• All-context invalidation, including global translations: If the INVPCID type is 2, the logical processor invalidates 
all mappings—including global translations—associated with any PCID. 

• All-context invalidation: If the INVPCID type is 3, the logical processor invalidates all mappings—except global 
translations—associated with any PCID. In some case, the instruction may invalidate global translations as 
well. 

The INVPCID descriptor comprises 128 bits and consists of a PCID and a linear address as shown in Figure 3-24. 
For INVPCID type 0, the processor uses the full 64 bits of the linear address even outside 64-bit mode; the linear 
address is not used for other INVPCID types. 


1 

27 64 63 1211 C 


Linear Address 

Reserved (must be zero) 

PCID 





Figure 3-24. INVPCID Descriptor 


1. If the paging structures map the linear address using a page larger than 4 KBytes and there are multiple TLB entries for that page 
(see Section 4.10.2.3, "Details of TLB Use," in the Inter 64 and IA-32 Architectures Software Developer's Manual, Volume 3A), the 
instruction invalidates all of them. 
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If CR4.PCIDE = 0, a logical processor does not cache information for any PCID other than OOOH. In this case, 
executions with INVPCID types 0 and 1 are allowed only if the PCID specified in the INVPCID descriptor is OOOH; 
executions with INVPCID types 2 and 3 invalidate mappings only for PCID OOOH. Note that CR4.PCIDE must be 0 
outside 64-bit mode (see Chapter 4.10.1, "Process-Context Identifiers (PCIDs)," of the I ntel® 64 and IA-32 Archi¬ 
tectures Software Developer's Manual, Volume 3A). 

Operation 

INVPCID_TYPE <- value of register operand; // must be In the range of 0-3 
INVPCID_DESC <- value of memory operand; 

CASE INVPCID_TYPE OF 

0: // Individual-address Invalidation 

PCID^INVPCID_DESC[11;0]; 

L_ADDR ^ INVPCID_DESC[127:64]; 

Invalidate mappings for L_ADDR associated with PCID except global translations; 

BREAK; 

1: // single PCID invalidation 

PCID ^INVPCID_DESC[11:0]; 

Invalidate all mappings associated with PCID except global translations; 

BREAK; 

2: // all PCID invalidation including global translations 

Invalidate all mappings for all PCIDs, including global translations; 

BREAK; 

3: // all PCID invalidation retaining global translations 

Invalidate all mappings for all PCIDs except global translations; 

BREAK; 

ESAC; 

Intel C/C++ Compiler Intrinsic Equivalent 

INVPCID: void _invpcid(unsigned_int32 type, void * descriptor); 

SIMD Floating-Point Exceptions 

None 

Protected Mode Exceptions 

#GP(0) If the current privilege level is not 0. 

If the memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 
If the DS, ES, FS, or GS register contains an unusable segment. 

If the source operand is located in an execute-only code segment. 

If an invalid type is specified in the register operand, i.e., INVPCID_TYPE > 3. 

If bits 63:12 of INVPCID_DESC are not all zero. 

If INVPCID_TYPE is either 0 or 1 and INVPCID_DESC[11:0] is not zero. 

If INVPCID_TYPE is 0 and the linear address in INVPCID_DESC[127:64] is not canonical. 
#PF(fault-code) If a page fault occurs in accessing the memory operand. 

#SS(0) If the memory operand effective address is outside the SS segment limit. 

If the SS register contains an unusable segment. 

#UD If if CPUID.(EAX=07H, ECX=0H):EBX.INVPCID (bit 10) = 0. 

If the LOCK prefix is used. 
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Real-Address Mode Exceptions 

#GP If an invalid type is specified in the register operand, i.e., INVPCID_TYPE > 3. 

If bits 63:12 of INVPCID_DESC are not all zero. 

If INVPCID_TYPE is either 0 or 1 and INVPCID_DESC[11:0] is not zero. 

If INVPCID_TYPE is 0 and the linear address in INVPCID_DESC[127:64] is not canonical. 

#UD If CPUID.(EAX=07H, ECX=OH):EBX.INVPCID (bit 10) = 0. 

If the LOCK prefix is used. 

\/irtual-8086 Mode Exceptions 

#GP(0) The INVPCID instruction is not recognized in virtual-8086 mode. 

Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

64-Bit Mode Exceptions 

#GP(0) If the current privilege level is not 0. 

If the memory operand is in the CS, DS, ES, FS, or GS segments and the memory address is 
in a non-canonical form. 

If an invalid type is specified in the register operand, i.e., INVPCID_TYPE > 3. 

If bits 63:12 of INVPCID_DESC are not all zero. 

If CR4.PCIDE=0, INVPCID_TYPE is either 0 or 1, and INVPCID_DESC[11:0] is not zero. 

If INVPCID_TYPE is 0 and the linear address in INVPCID_DESC[127:64] is not canonical. 
#PF(fault-code) If a page fault occurs in accessing the memory operand. 

#SS(0) If the memory destination operand is in the SS segment and the memory address is in a non- 

canonical form. 

#UD If the LOCK prefix is used. 

If CPUID.(EAX=07H, ECX=0H):EBX.INVPCID (bit 10) = 0. 
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IRET/IRETD—Interrupt Return 


Opcode 

Instruction 

Op/ 

En 

64-Bit 

Mode 

Compat/ 
Leg Mode 

Description 

CF 

IRET 

NP 

Valid 

Valid 

Interrupt return (16-bit operand size). 

CF 

IRETD 

NP 

Valid 

Valid 

Interrupt return (32-bit operand size). 

REX.W + CF 

IRETQ 

NP 

Valid 

N.E. 

Interrupt return (64-bit operand size). 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

NP 

NA 

NA 

NA 

NA 


Description 

Returns program control from an exception or interrupt handler to a program or procedure that was interrupted by 
an exception, an external interrupt, or a software-generated interrupt. These instructions are also used to perform 
a return from a nested task. (A nested task is created when a CALL instruction is used to initiate a task switch or 
when an interrupt or exception causes a task switch to an interrupt or exception handler.) See the section titled 
"Task Linking" in Chapter 7 of the Intel® 64 and IA-32 Architectures Software Developer's Manual, Volume 3A. 

IRET and IRETD are mnemonics for the same opcode. The IRETD mnemonic (interrupt return double) is intended 
for use when returning from an interrupt when using the 32-bit operand size; however, most assemblers use the 
IRET mnemonic interchangeably for both operand sizes. 

In Real-Address Mode, the IRET instruction preforms a far return to the interrupted program or procedure. During 
this operation, the processor pops the return instruction pointer, return code segment selector, and EFLAGS image 
from the stack to the EIP, CS, and EFLAGS registers, respectively, and then resumes execution of the interrupted 
program or procedure. 

In Protected Mode, the action of the IRET instruction depends on the settings of the NT (nested task) and VM flags 
in the EFLAGS register and the VM flag in the EFLAGS image stored on the current stack. Depending on the setting 
of these flags, the processor performs the following types of interrupt returns: 

• Return from virtual-8086 mode. 

• Return to virtual-8086 mode. 

• Intra-privilege level return. 

• Inter-privilege level return. 

• Return from nested task (task switch). 

If the NT flag (EFLAGS register) is cleared, the IRET instruction performs a far return from the interrupt procedure, 
without a task switch. The code segment being returned to must be equally or less privileged than the interrupt 
handler routine (as indicated by the RPL field of the code segment selector popped from the stack). 

As with a real-address mode interrupt return, the IRET instruction pops the return instruction pointer, return code 
segment selector, and EFLAGS image from the stack to the EIP, CS, and EFLAGS registers, respectively, and then 
resumes execution of the interrupted program or procedure. If the return is to another privilege level, the IRET 
instruction also pops the stack pointer and SS from the stack, before resuming program execution. If the return is 
to virtual-8086 mode, the processor also pops the data segment registers from the stack. 

If the NT flag is set, the IRET instruction performs a task switch (return) from a nested task (a task called with a 
CALL instruction, an interrupt, or an exception) back to the calling or interrupted task. The updated state of the 
task executing the IRET instruction is saved in its TSS. If the task is re-entered later, the code that follows the IRET 
instruction is executed. 

If the NT flag is set and the processor is in IA-32e mode, the IRET instruction causes a general protection excep¬ 
tion. 

If nonmaskable interrupts (NMIs) are blocked (see Section 6.7.1, "Flandling Multiple NMIs" in the I ntel® 64 and 
IA-32 Architectures Software Developer's Manual, Volume 3A), execution of the IRET instruction unblocks NMIs. 
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This unblocking occurs even if the instruction causes a fault. In such a case, NMIs are unmasked before the excep¬ 
tion handler is invoked. 

In 64-bit mode, the instruction's default operation size is 32 bits. Use of the REX. W prefix promotes operation to 64 
bits (IRETQ). See the summary chart at the beginning of this section for encoding data and limits. 

See "Changes to Instruction Behavior in VMX Non-Root Operation" in Chapter 25 of the I ntel® 64 and IA-32 Archi¬ 
tectures Software Developer's Manual, Volume 3C, for more information about the behavior of this instruction in 
VMX non-root operation. 

Operation 

IFPE = 0 

THEN GOTO REAL-ADDRESS-MODE; 

ELSIF (IA32_EFER.LMA = 0) 

THEN 

IF (EFLAGS.VM = 1) 

THEN GOTO RETURN-FROM-VIRTUAL-8086-MODE; 

ELSE GOTO PROTECTED-MODE; 

FI; 

ELSE GOTO IA-32e-MODE; 

FI; 

REAL-ADDRESS-MODE; 

IF OperandSIze = 32 
THEN 

EIP ^ Pop(); 

CS <- Pop(); (* 32-blt pop, hlgh-order 16 bits discarded *) 
tempEFLAGS Pop(); 

EFLAGS ^ (tempEFLAGS AND 257FD5H) OR (EFLAGS AND 1AOOOOH); 

ELSE (* OperandSIze =16*) 

EIP <- Pop(); (* 16-bit pop; clear upper 16 bits *) 

CS ^ Pop(); (* 16-bit pop *) 

EFLAGS[15:0] ^ Pop(); 

FI¬ 

END; 

RETURN-FROM-VIRTUAL-8086-MODE: 

(* Processor is in virtual-8086 mode when IRET is executed and stays in virtual-8086 mode *) 

IF lOPL = 3 (* Virtual mode: PE = 1, VM = 1, lOPL = 3 *) 

THEN IF OperandSIze =32 
THEN 

EIP ^ Pop(); 

CS <- Pop(); (* 32-bit pop, high-order 16 bits discarded *) 

EFLAGS ^ Pop(); 

(* VM, lOPUVIP and VIF EFLAG bits not modified by pop *) 

IF EIP not within CS limit 
THEN #GP(0); FI; 

ELSE (* OperandSIze =16*) 

EIP <- Pop(); (* 16-bit pop; clear upper 16 bits *) 

CS ^ Pop(); (* 16-bit pop *) 

EFLAGS[15:0] ^ Pop(); (* lOPL in EFLAGS not modified by pop *) 

IF EIP not within CS limit 
THEN #GP(0); FI; 

FI; 

ELSE 
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#GP(0); (* Trap to vlrtual-8086 monitor: PE = 1, VM = 1, lOPL < 3 *) 

FI; 

END; 

PROTECTED-MODE: 

IFNT=1 

THEN GOTO TASK-RETURN; (* PE = 1, VM = 0, NT = 1 *) 

FI; 

IF OperandSIze = 32 
THEN 

EIP ^ Pop(); 

CS Pop(); (* 32-blt pop, high-order 16 bits discarded *) 
tempEFLAGS <- Pop(); 

ELSE (* OperandSIze =16*) 

EIP Pop(); (* 16-blt pop; clear upper bits *) 

CS ^ Pop(); (* 16-blt pop *) 

tempEFLAGS <- Pop(); (* 16-blt pop; clear upper bits *) 

FI; 

IFtempEFLAGS(VM)= 1 and CPL = 0 

THEN GOTO RETURN-TO-VIRTUAL-8086-MODE; 

ELSE GOTO PROTECTED-MODE-RETURN; 

FI; 

TASK-RETURN: (* PE = 1, VM = 0, NT = 1 *) 

SWITCH-TASKS (without nesting) to TSS specified in link field of current TSS; 

Mark the task just abandoned as NOT BUSY; 

IF EIP is not within CS limit 
THEN #GP(0); FI; 

END; 

RETURN-TO-VIRTUAL-8086-MODE: 

(* Interrupted procedure was in uirtual-8086 mode: PE = 1, CPL=0, VM = 1 in flag Image *) 
IF EIP not within CS limit 
THEN #GP(0); FI; 

EFLAGS ^ tempEFLAGS; 

ESP ^ Pop(); 

SS <- Pop(); (* Pop 2 words; throw away hIgh-order word *) 

ES <- Pop(); (* Pop 2 words; throw away hIgh-order word *) 

DS Pop(); (* Pop 2 words; throw away high-order word *) 

FS Pop(); (* Pop 2 words; throw away high-order word *) 

GS <- Pop(); (* Pop 2 words; throw away high-order word *) 

CPL ^ 3; 

(* Resume execution in Virtual-8086 mode *) 

END; 

PROTECTED-MODE-RETURN: (* PE = 1 *) 

IF CS(RPL) > CPL 

THEN GOTO RETURN-TO-OUTER-PRIVILEGE-LEVEL; 

ELSE GOTO RETURN-TO-SAME-PRIVILEGE-LEVEL; FI; 

END; 

RETURN-TO-OUTER-PRIVILEGE-LEVEL: 

IF OperandSIze = 32 
THEN 
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ESP ^ Pop(); 

SS <- Pop(); (* 32-bit pop, high-order 16 bits discarded *) 
ELSEIF0perandSize=16 
THEN 

ESP <- Pop(); (* 16-bit pop; clear upper bits *) 

SS ^ Pop(); (* 16-bit pop *) 

ELSE (* OperandSize = 64 *) 

RSP ^ Pop(); 

SS <- Pop(); (* 64-bit pop, high-order 48 bits discarded *) 

FL¬ 
IP new mode 64-Blt Mode 
THEN 

IF EIP Is not within CS limit 
THEN #GP(0); FI; 

ELSE (* new mode = 64-bit mode *) 

IF RIP Is non-canonical 

THEN #GP(0); FI; 

FI; 

EFLAGS (CF, PF, AF, ZF, SF, TF, DF, OF, NT) ^ tempEFLAGS; 

IF OperandSize = 32 

THEN EFLAGS(RF, AC, ID) ^ tempEFLAGS; FI; 

IFCPL<IOPL 

THEN EFLAGS(IF) ^ tempEFLAGS; FI; 

IFCPL=0 

THEN 

EFLAGS(IOPL) ^ tempEFLAGS; 

IF OperandSize = 32 

THEN EFLAGS(VM, VIF, VIP) ^ tempEFLAGS; FI; 

IF OperandSize = 64 

THEN EFLAGS(VIF, VIP) ^ tempEFLAGS; FI; 

FI; 

CPL ^ CS(RPL); 

FOR each SegReg in (ES, FS, GS, and DS) 

DO 

tempDesc <- descriptor cache for SegReg (* hidden part of segment register *) 
IF tempDesc(DPL) < CPL AND tempDesc(Type) Is data or non-conforming code 
THEN (* Segment register invalid *) 

SegReg ^ NULL; 

PL¬ 

OD; 

END; 

RETURN-TO-SAME-PRIVILEGE-LEVEL: (* PE = 1, RPL = CPL *) 

IF new mode 64-Bit Mode 
THEN 

IF EIP Is not within CS limit 
THEN #GP(0); FI; 

ELSE (* new mode = 64-bit mode *) 

IF RIP Is non-canonIcal 

THEN #GP(0); FI; 

Fl; 

EFLAGS (CF, PF, AF, ZF, SF, TF, DF, OF, NT) ^ tempEFLAGS; 

IF OperandSize = 32 or OperandSize = 64 

THEN EFLAGS(RF, AC, ID) ^ tempEFLAGS; Fl; 
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IFCPL<IOPL 

THEN EFLAGS(IF) ^ tempEFLAGS; FI; 

IF CPL = 0 

THEN (* VM = 0 In flags Image *) 

EFLAGS(IOPL) ^ tempEFLAGS; 

IF OperandSIze = 32 or OperandSIze = 64 
THEN EFLAGS(VIF, VIP) ^ tempEFLAGS; FI; 

FI; 

END; 

IA-32e-MODE: 

IF NT = 1 

THEN #GP(0); 

ELSE IF OperandSIze = 32 
THEN 

EIP ^ Pop(); 

CS ^ Pop(); 
tempEFLAGS <- Pop(); 

ELSEIFOperandSlze=16 

THEN 

EIP Pop(); (* 16-blt pop; clear upper bits *) 

CS ^ Pop(); (* 16-blt pop *) 

tempEFLAGS <- Pop(); (* 16-blt pop; clear upper bits *) 

FI; 

ELSE (* OperandSIze = 64 *) 

THEN 

RIP ^ Pop(); 

CS Pop(); (* 64-bit pop, high-order 48 bits discarded *) 
tempRFLAGS <- Pop(); 

FI; 

IF tempCS.RPL > CPL 

THEN GOTO RETURN-TO-OUTER-PRIVILEGE-LEVEL; 

ELSE 

IF Instruction began in 64-Bit Mode 
THEN 

IF OperandSIze = 32 
THEN 

ESP ^ Pop(); 

SS <- Pop(); (* 32-blt pop, hIgh-order 16 bits discarded *) 
ELSEIFOperandSize=16 
THEN 

ESP Pop(); (* 16-bit pop; clear upper bits *) 

SS ^ Pop(); (* 16-blt pop *) 

ELSE (* OperandSIze = 64 *) 

RSP ^ Pop(); 

SS <- Pop(); (* 64-blt pop, hIgh-order 48 bits discarded *) 
FI; 

FI; 

GOTO RETURN-TO-SAME-PRIVILEGE-LEVEL; FI¬ 
END; 
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Flags Affected 

All the flags and fields in the EFLAGS register are potentially modified, depending on the mode of operation of the 
processor. If performing a return from a nested task to a previous task, the EFLAGS register will be modified 
according to the EFLAGS image stored in the previous task's TSS. 

Protected Mode Exceptions 

#GP(0) If the return code or stack segment selector is NULL. 


#GP(selector) 

If the return instruction pointer is not within the return code segment limit. 

If a segment selector index is outside its descriptor table limits. 

If the return code segment selector RPL is less than the GPL. 

If the DPL of a conforming-code segment is greater than the return code segment selector 
RPL. 

If the DPL for a nonconforming-code segment is not equal to the RPL of the code segment 
selector. 

If the stack segment descriptor DPL is not equal to the RPL of the return code segment 
selector. 

#SS(0) 

#NP(selector) 

#PF(fault-code) 

#AC(0) 

If the stack segment is not a writable data segment. 

If the stack segment selector RPL is not equal to the RPL of the return code segment selector. 
If the segment descriptor for a code segment does not indicate it is a code segment. 

If the segment selector for a TSS has its local/global bit set for local. 

If a TSS segment descriptor specifies that the TSS is not busy. 

If a TSS segment descriptor specifies that the TSS is not available. 

If the top bytes of stack are not within stack limits. 

If the return code or stack segment is not present. 

If a page fault occurs. 

If an unaligned memory reference occurs when the CPL is 3 and alignment checking is 
enabled. 

#UD 

If the LOCK prefix is used. 


Real-Address Mode Exceptions 

#GP If the return instruction pointer is not within the return code segment limit. 

#SS If the top bytes of stack are not within stack limits. 

\/irtual-8086 Mode Exceptions 

#GP(0) If the return instruction pointer is not within the return code segment limit. 


#PF(fault-code) 

#SS(0) 

#AC(0) 

#UD 

IF lOPL not equal to 3. 

If a page fault occurs. 

If the top bytes of stack are not within stack limits. 

If an unaligned memory reference occurs and alignment checking is enabled. 

If the LOCK prefix is used. 


Compatibility Mode Exceptions 

#GP(0) If EFLAGS.NT[bit 14] = 1 

Other exceptions same as in Protected Mode. 
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64-Bit Mode Exceptions 

#GP(0) If EFLAGS.NT[bit 14] = 1. 

If the return code segment selector is NULL. 

If the stack segment selector is NULL going back to compatibility mode. 

If the stack segment selector is NULL going back to CPL3 64-bit mode. 

If a NULL stack segment selector RPL is not equal to GPL going back to non-CPL3 64-bit mode. 
If the return instruction pointer is not within the return code segment limit. 

If the return instruction pointer is non-canonical. 

#GP(Selector) If a segment selector index is outside its descriptor table limits. 

If a segment descriptor memory address is non-canonical. 

If the segment descriptor for a code segment does not indicate it is a code segment. 

If the proposed new code segment descriptor has both the D-bit and L-bit set. 

If the DPL for a nonconforming-code segment is not equal to the RPL of the code segment 
selector. 

If CPL is greater than the RPL of the code segment selector. 

If the DPL of a conforming-code segment is greater than the return code segment selector 
RPL. 


#SS(0) 

#NP(selector) 

#PF(fault-code) 

#AC(0) 

#UD 


If the stack segment is not a writable data segment. 

If the stack segment descriptor DPL is not equal to the RPL of the return code segment 
selector. 

If the stack segment selector RPL is not equal to the RPL of the return code segment selector. 
If an attempt to pop a value off the stack violates the SS limit. 

If an attempt to pop a value off the stack causes a non-canonical address to be referenced. 

If the return code or stack segment is not present. 

If a page fault occurs. 

If an unaligned memory reference occurs when the CPL is 3 and alignment checking is 
enabled. 

If the LOCK prefix is used. 
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Jcc—Jump if Condition Is Met 


Opcode 

Instruction 

Op/ 

En 

64-Bit 

Mode 

Compat/ 
Leg Mode 

Description 

11 cb 

JA rel8 

D 

Valid 

Valid 

Jump short if above (CF=0 and ZF=0). 

13 cb 

JAE rel8 

D 

Valid 

Valid 

Jump short if above or equal (CF=0). 

IZcb 

JB re/S 

D 

Valid 

Valid 

Jump short if below (CF=1). 

16 cb 

JBE rel8 

D 

Valid 

Valid 

Jump short if below or equal (CF=1 or ZF=1). 

72 cb 

JC rel8 

D 

Valid 

Valid 

Jump short if carry (CF=1). 

E3 cb 

JCXZ rel8 

D 

N.E. 

Valid 

Jump short if CX register is 0. 

E3cb 

JECXZ rel8 

D 

Valid 

Valid 

Jump short if ECX register is 0. 

E3cb 

JRCXZ re/S 

D 

Valid 

N.E. 

Jump short if RCX register is 0. 

74 cb 

JE rel8 

D 

Valid 

Valid 

Jump short if equal (ZF= 1). 

7Fcb 

JG re/S 

D 

Valid 

Valid 

Jump short if greater (ZF=0 and SF=0F). 

7Dcb 

JGE rel8 

D 

Valid 

Valid 

Jump short if greater or equal (SF=OF). 

7Ccb 

JL rel8 

D 

Valid 

Valid 

Jump short if less (SF?i: OF). 

7Ecb 

JLE re/8 

D 

Valid 

Valid 

Jump short if less or equal (ZF=1 or SF?i: OF). 

16 cb 

JNA rel8 

D 

Valid 

Valid 

Jump short if not above (CF=1 or ZF=1). 

72 cb 

JNAE rel8 

D 

Valid 

Valid 

Jump short if not above or equal (CF=1). 

13 cb 

JNB rel8 

D 

Valid 

Valid 

Jump short if not below (CF=0). 

11 cb 

JNBE re/8 

D 

Valid 

Valid 

Jump short if not below or equal (CF=0 and 
ZF=0). 

13 cb 

JNC rel8 

D 

Valid 

Valid 

Jump short if not carry (CF=0). 

15 cb 

JNE rel8 

D 

Valid 

Valid 

Jump short if not equal (ZF=0). 

7Ecb 

JNG re/S 

D 

Valid 

Valid 

Jump short if not greater (ZF=1 or SF^s: OF). 

7Ccb 

JNGE rel8 

D 

Valid 

Valid 

Jump short if not greater or equal (SF?i: OF). 

7Dcb 

JNL rel8 

D 

Valid 

Valid 

Jump short if not less (SF=OF). 

7Fcb 

JNLE rel8 

D 

Valid 

Valid 

Jump short if not less or equal (ZF=0 and 
SF=OF). 

71 cb 

JNO rel8 

D 

Valid 

Valid 

Jump short if not overflow (0F=0). 

7Bcb 

JNP rel8 

D 

Valid 

Valid 

Jump short if not parity (PF=0). 

19 cb 

JNS re/8 

D 

Valid 

Valid 

Jump short if not sign (SF=0). 

75 cb 

JNZ rel8 

D 

Valid 

Valid 

Jump short if not zero (ZF=0). 

10 cb 

JO re/8 

D 

Valid 

Valid 

Jump short if overflow (OF=1). 

Ikcb 

JP re/8 

D 

Valid 

Valid 

Jump short if parity (PF=1). 

7Acb 

JPE re/S 

D 

Valid 

Valid 

Jump short if parity even (PF=1). 

7Bcb 

JPO re/S 

D 

Valid 

Valid 

Jump short if parity odd (PF=0). 

18 cb 

JS rel8 

D 

Valid 

Valid 

Jump short if sign (SF=1). 

14 cb 

JZ rel8 

D 

Valid 

Valid 

Jump short if zero (ZF = 1). 

OF 87 cw 

J A re/7 6 

D 

N.S. 

Valid 

Jump near if above (CF=0 and ZF=0). Not 
supported in 64-bit mode. 

OF 87 cd 

JA re/32 

D 

Valid 

Valid 

Jump near if above (CF=0 and ZF=0). 

OF 83 cw 

JAE re/76 

D 

N.S. 

Valid 

Jump near if above or equal (CF=0). Not 
supported in 64-bit mode. 
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Opcode 

Instruction 

Op/ 

En 

64-Bit 

Mode 

Compat/ 
Leg Mode 

Description 

OF 83 cd 

JAE re/32 

0 

Valid 

Valid 

Jump near if above or equal (CF=0). 

OF 82 cw 

JB re/76 

0 

N.S. 

Valid 

jump near if below (CF=1). Not supported in 
64-bit mode. 

OF 82 cd 

JB re/32 

0 

Valid 

Valid 

Jump near if below (CF=1). 

OF 86 cw 

JBE rellB 

0 

N.S. 

Valid 

Jump near if below or equal (CF=1 or ZF=1). 

Not supported in 64-bit mode. 

OF 86 cd 

JBE re/32 

0 

Valid 

Valid 

Jump near if below or equal (CF=1 or ZF=1). 

OF 82 cw 

JCre/76 

0 

N.S. 

Valid 

Jump near if carry (CF=1). Not supported in 
64-bit mode. 

OF 82 cd 

JC re/32 

0 

Valid 

Valid 

Jump near if carry (CF=1). 

OF 84 cw 

JE rellB 

0 

N.S. 

Valid 

Jump near if equal (ZF=1). Not supported in 
64-bit mode. 

OF 84 cd 

JE re/32 

0 

Valid 

Valid 

Jump near if equal (ZF=1). 

OF 84 cw 

JZ rellB 

0 

N.S. 

Valid 

Jump near if 0 (ZF=1). Not supported in 64-bit 
mode. 

OF 84 cd 

JZ re/32 

0 

Valid 

Valid 

Jump near if 0 (ZF=1). 

OF 8F cw 

JG rellB 

0 

N.S. 

Valid 

Jump near if greater (ZF=0 and SF=0F). Not 
supported in 64-bit mode. 

OF 8F cd 

JG re/32 

0 

Valid 

Valid 

Jump near if greater (ZF=0 and SF=0F). 

OF 80 cw 

JGE rellB 

0 

N.S. 

Valid 

Jump near if greater or equal (SF=OF). Not 
supported in 64-bit mode. 

OF 80 cd 

JGE re/32 

0 

Valid 

Valid 

Jump near if greater or equal (SF=OF). 

OF 8C cw 

JL re/76 

0 

N.S. 

Valid 

Jump near if less (SF?i: OF). Not supported in 
64-bit mode. 

OF 8C cd 

JL re/32 

0 

Valid 

Valid 

Jump near if less (SF?i: OF). 

OF 8E cw 

JLEre/76 

0 

N.S. 

Valid 

Jump near if less or equal (ZF=1 or SF?i: OF). 

Not supported in 64-bit mode. 

OF 8E cd 

JLE re/32 

0 

Valid 

Valid 

Jump near if less or equal (ZF=1 or SF^s: OF). 

OF 86 cw 

JNAre/76 

0 

N.S. 

Valid 

Jump near if not above (CF=1 or ZF=1). Not 
supported in 64-bit mode. 

OF 86 cd 

JNA re/32 

0 

Valid 

Valid 

Jump near if not above (CF=1 or ZF=1). 

OF 82 cw 

JNAEre/76 

0 

N.S. 

Valid 

Jump near if not above or equal (CF=1). Not 
supported in 64-bit mode. 

OF 82 cd 

JNAE re/32 

0 

Valid 

Valid 

Jump near if not above or equal (CF= 1). 

OF 83 cw 

JNB re/76 

0 

N.S. 

Valid 

Jump near if not below (CF=0). Not supported 
in 64-bit mode. 

OF 83 cd 

JNB re/32 

0 

Valid 

Valid 

Jump near if not below (CF=0). 

OF 87 cw 

JNBE re/76 

0 

N.S. 

Valid 

Jump near if not below or equal (CF=0 and 
ZF=0). Not supported in 64-bit mode. 

OF 87 cd 

JNBE re/32 

0 

Valid 

Valid 

Jump near if not below or equal (CF=0 and 
ZF=0). 

OF 83 cw 

JNC re/76 

0 

N.S. 

Valid 

Jump near if not carry (CF=0). Not supported 
in 64-bit mode. 

OF 83 cd 

JNC re/32 

0 

Valid 

Valid 

Jump near if not carry (CF=0). 
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Opcode 

Instruction 

Op/ 

En 

64-Bit 

Mode 

Compat/ 
Leg Mode 

Description 

OF 85 cw 

JNE re/76 

0 

N.S. 

Valid 

Jump near if not equal (ZF=0). Not supported 
in 64-bit mode. 

OF 85 cd 

JNE rel32 

0 

Valid 

Valid 

Jump near if not equal (ZF=0). 

OF 8E cw 

JNG re/76 

0 

N.S. 

Valid 

Jump near if not greater (ZF=1 or SF?i: OF). 

Not supported in 64-bit mode. 

OF 8E cd 

JNG rel32 

0 

Valid 

Valid 

Jump near if not greater (ZF=1 or SF?i: OF). 

OF 8C cw 

JNGE re/76 

0 

N.S. 

Valid 

Jump near if not greater or equal (SF?i: OF). 

Not supported in 64-bit mode. 

OF 8C cd 

JNGE rel32 

0 

Valid 

Valid 

Jump near if not greater or equal (SF^s: OF). 

OF 80 cw 

JNL re/76 

0 

N.S. 

Valid 

Jump near if not less (SF=OF). Not supported 
in 64-bit mode. 

OF 80 cd 

JNL rel32 

0 

Valid 

Valid 

Jump near if not less (SF=OF). 

OF 8F cw 

JNLE rellB 

0 

N.S. 

Valid 

Jump near if not less or equal (ZF=0 and 
SF=0F). Not supported in 64-bit mode. 

OF 8F cd 

JNLE rel32 

0 

Valid 

Valid 

Jump near if not less or equal (ZF=0 and 
SF=OF). 

OF 81 cw 

JNO cells 

0 

N.S. 

Valid 

Jump near if not overflow (OF=0). Not 
supported in 64-bit mode. 

OF 81 cd 

JNO re/3^ 

0 

Valid 

Valid 

Jump near if not overflow (OF=0). 

OF 88 cw 

JNP re/76 

0 

N.S. 

Valid 

Jump near if not parity (PF=0). Not supported 
in 64-bit mode. 

OF 88 cd 

JNP rel32 

0 

Valid 

Valid 

Jump near if not parity (PF=0). 

OF 89 cw 

JNS re/76 

0 

N.S. 

Valid 

Jump near if not sign (SF=0). Not supported in 
64-bit mode. 

OF 89 cd 

JNS rel32 

0 

Valid 

Valid 

Jump near if not sign (SF=0). 

OF 85 cw 

JNZ re/76 

0 

N.S. 

Valid 

Jump near if not zero (ZF=0). Not supported in 
64-bit mode. 

OF 85 cd 

JNZ rel32 

0 

Valid 

Valid 

Jump near if not zero (ZF=0). 

OF 80 cw 

JO cells 

0 

N.S. 

Valid 

Jump near if overflow (OF=1). Not supported 
in 64-bit mode. 

OF 80 cd 

JO cel32 

0 

Valid 

Valid 

Jump near if overflow (OF=1). 

OF 8A cw 

JP re/76 

0 

N.S. 

Valid 

Jump near if parity (PF=1). Not supported in 
64-bit mode. 

OF 8A cd 

JP cel32 

0 

Valid 

Valid 

Jump near if parity (PF=1). 

OF 8A cw 

iPE cells 

0 

N.S. 

Valid 

Jump near if parity even (PF=1). Not 
supported in 64-bit mode. 

OF 8A cd 

JPE cel32 

0 

Valid 

Valid 

Jump near if parity even (PF= 1). 

OF 88 cw 

JPO re/76 

0 

N.S. 

Valid 

Jump near if parity odd (PF=0). Not supported 
in 64-bit mode. 

OF 88 cd 

JPO cel32 

0 

Valid 

Valid 

Jump near if parity odd (PF=0). 

OF 88 cw 

JS re/76 

0 

N.S. 

Valid 

Jump near if sign (SF=1). Not supported in 64- 
bit mode. 
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Opcode 

Instruction 

Op/ 

En 

64-Bit 

Mode 

Compat/ 
Leg Mode 

Description 

OF 88 cd 

JS rel32 

D 

Valid 

Valid 

Jump near if sign (SF=1). 

OF 84 cw 

JZ cell6 

D 

N.S. 

Valid 

Jump near if 0 (ZF=1). Not supported in 64-bit 
mode. 

OF 84 cd 

JZ rel32 

D 

Valid 

Valid 

Jump near if 0 (ZF=1). 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

D 

Offset 

NA 

NA 

NA 


Description 

Checks the state of one or more of the status flags in the EFLAGS register (CF, OF, PF, SF, and ZF) and, if the flags 
are in the specified state (condition), performs a jump to the target instruction specified by the destination 
operand. A condition code (cc) is associated with each instruction to indicate the condition being tested for. If the 
condition is not satisfied, the jump is not performed and execution continues with the instruction following the Jcc 
instruction. 

The target instruction is specified with a relative offset (a signed offset relative to the current value of the instruc¬ 
tion pointer in the EIP register). A relative offset (rei8, rell6, or rel32) is generally specified as a label in assembly 
code, but at the machine code level, it is encoded as a signed, 8-bit or 32-bit immediate value, which is added to 
the instruction pointer. Instruction coding is most efficient for offsets of -128 to +127. If the operand-size attribute 
is 16, the upper two bytes of the EIP register are cleared, resulting in a maximum instruction pointer size of 16 bits. 

The conditions for each Jcc mnemonic are given in the "Description" column of the table on the preceding page. The 
terms "less" and "greater" are used for comparisons of signed integers and the terms "above" and "below" are used 
for unsigned integers. 

Because a particular state of the status flags can sometimes be interpreted in two ways, two mnemonics are 
defined for some opcodes. For example, the JA (jump if above) instruction and the JNBE (jump if not below or 
equal) instruction are alternate mnemonics for the opcode 77FI. 

The Jcc instruction does not support far jumps (jumps to other code segments). When the target for the conditional 
jump is in a different segment, use the opposite condition from the condition being tested for the Jcc instruction, 
and then access the target with an unconditional far jump (JMP instruction) to the other segment. For example, the 
following conditional far jump is illegal: 

JZ FARLABEL; 

To accomplish this far jump, use the following two instructions: 

JNZ BEYOND; 

JMP FARLABEL; 

BEYOND: 

The JRCXZ, JECXZ and JCXZ instructions differ from other Jcc instructions because they do not check status flags. 
Instead, they check RCX, ECX or CX for 0. The register checked is determined by the address-size attribute. These 
instructions are useful when used at the beginning of a loop that terminates with a conditional loop instruction 
(such as LOOPNE). They can be used to prevent an instruction sequence from entering a loop when RCX, ECX or CX 
is 0. This would cause the loop to execute 2®^, 2^^ or 64K times (not zero times). 

All conditional jumps are converted to code fetches of one or two cache lines, regardless of jump address or cache- 
ability. 

In 64-bit mode, operand size is fixed at 64 bits. JMP Short is RIP = RIP + 8-bit offset sign extended to 64 bits. JMP 
Near is RIP = RIP + 32-bit offset sign extended to 64-bits. 
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Operation 

IF condition 
THEN 

tempElP ^ EIP + SlgnExtend(DEST); 

IF OperandSize = 16 

THEN tempElP ^ tempElP AND OOOOFFFFH; 

FI; 

IF tempElP Is not within code segment limit 
THEN #GP(0); 

ELSE EIP ^ tempElP 
FI; 

FI; 

Flags Affected 

None 

Protected Mode Exceptions 

#GP(0) If the offset being jumped to is beyond the limits of the CS segment. 

#UD If the LOCK prefix is used. 

Real-Address Mode Exceptions 

#GP If the offset being jumped to is beyond the limits of the CS segment or is outside of the effec¬ 

tive address space from 0 to FFFFH. This condition can occur if a 32-bit address size override 
prefix is used. 

#UD If the LOCK prefix is used. 

Virtual-SOSe Mode Exceptions 

Same exceptions as in real address mode. 

Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

64-Bit Mode Exceptions 

#GP(0) If the memory address is in a non-canonical form. 

#UD If the LOCK prefix is used. 
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MP—Jump 


Opcode 

Instruction 

Op/ 

Gn 

64-Bit 

Mode 

Compat/ 
Leg Mode 

Description 

EB cb 

JMP re/S 

D 

Valid 

Valid 

Jump short, RIP = RIP + 8-blt displacement sign 
extended to 64-blts 

E9 cw 

JMP re/7 6 

D 

N.S. 

Valid 

Jump near, relative, displacement relative to 
next instruction. Not supported In 64-blt 
mode. 

E9 cd 

JMP rel32 

D 

Valid 

Valid 

Jump near, relative, RIP = RIP + 32-blt 
displacement sign extended to 64-blts 

FF/4 

JMP r/m 76 

M 

N.S. 

Valid 

Jump near, absolute indirect, address = zero- 
extended r/m16. Not supported In 64-blt 
mode. 

FF/4 

JMP r/m32 

M 

N.S. 

Valid 

Jump near, absolute indirect, address given In 
r/m32. Not supported In 64-blt mode. 

FF/4 

JMP r/m64 

M 

Valid 

N.E. 

Jump near, absolute indirect, RIP = 64-Bit 
offset from register or memory 

EA cd 

JMP ptrl6:16 

D 

Inv. 

Valid 

Jump far, absolute, address given in operand 

EA cp 

JMP ptr16:32 

D 

Inv. 

Valid 

Jump far, absolute, address given in operand 

FF /5 

JMPm76;76 

D 

Valid 

Valid 

Jump far, absolute indirect, address given in 
ml6:16 

FF /5 

JMP m 76;3^ 

D 

Valid 

Valid 

Jump far, absolute indirect, address given in 
ml 6:32. 

REX.W + FF /5 

JMP m 76;64 

D 

Valid 

N.E. 

Jump far, absolute indirect, address given in 
ml 6:64. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

D 

Offset 

NA 

NA 

NA 

M 

ModRM:r/m (r) 

NA 

NA 

NA 


Description 

Transfers program control to a different point in the instruction stream without recording return information. The 
destination (target) operand specifies the address of the instruction being jumped to. This operand can be an 
immediate value, a general-purpose register, or a memory location. 

This instruction can be used to execute four different types of jumps: 

• Near jump—A jump to an instruction within the current code segment (the segment currently pointed to by the 
CS register), sometimes referred to as an intrasegment jump. 

• Short jump—A near jump where the jump range is limited to -128 to +127 from the current EIP value. 

• Far jump—A jump to an instruction located in a different segment than the current code segment but at the 
same privilege level, sometimes referred to as an intersegment jump. 

• Task switch—A jump to an instruction located in a different task. 

A task switch can only be executed in protected mode (see Chapter 7, in the Intel® 64 and IA-32 Architectures 
Software Developer's Manual, Volume 3A, for information on performing task switches with the JMP instruction). 

Near and Short Jumps. When executing a near jump, the processor jumps to the address (within the current code 
segment) that is specified with the target operand. The target operand specifies either an absolute offset (that is 
an offset from the base of the code segment) or a relative offset (a signed displacement relative to the current 
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value of the instruction pointer in the EIP register). A near jump to a relative offset of 8-bits (rei8) is referred to as 
a short jump. The CS register is not changed on near and short jumps. 

An absolute offset is specified indirectly in a general-purpose register or a memory location (r/ml6 or r/m32). The 
operand-size attribute determines the size of the target operand (16 or 32 bits). Absolute offsets are loaded 
directly into the EIP register. If the operand-size attribute is 16, the upper two bytes of the EIP register are cleared, 
resulting in a maximum instruction pointer size of 16 bits. 

A relative offset (rel8, rell6, or rel32) is generally specified as a label in assembly code, but at the machine code 
level, it is encoded as a signed 8-, 16-, or 32-bit immediate value. This value is added to the value in the EIP 
register. (Here, the EIP register contains the address of the instruction following the IMP instruction). When using 
relative offsets, the opcode (for short vs. near jumps) and the operand-size attribute (for near relative jumps) 
determines the size of the target operand (8, 16, or 32 bits). 

Far Jumps in Real-Address or Virtual-8086 Mode. When executing a far jump in real-address or virtual-8086 mode, 
the processor jumps to the code segment and offset specified with the target operand. Here the target operand 
specifies an absolute far address either directly with a pointer (ptrl6:16 or ptrl6:32) or indirectly with a memory 
location (ml6:16 or ml6:32). With the pointer method, the segment and address of the called procedure is 
encoded in the instruction, using a 4-byte (16-bit operand size) or 6-byte (32-bit operand size) far address imme¬ 
diate. With the indirect method, the target operand specifies a memory location that contains a 4-byte (16-bit 
operand size) or 6-byte (32-bit operand size) far address. The far address is loaded directly into the CS and EIP 
registers. If the operand-size attribute is 16, the upper two bytes of the EIP register are cleared. 

Far Jumps in Protected Mode. When the processor is operating in protected mode, the JMP instruction can be used 
to perform the following three types of far jumps: 

• A far jump to a conforming or non-conforming code segment. 

• A far jump through a call gate. 

• A task switch. 

(The JMP instruction cannot be used to perform inter-privilege-level far jumps.) 

In protected mode, the processor always uses the segment selector part of the far address to access the corre¬ 
sponding descriptor in the GDT or LDT. The descriptor type (code segment, call gate, task gate, or TSS) and access 
rights determine the type of jump to be performed. 

If the selected descriptor is for a code segment, a far jump to a code segment at the same privilege level is 
performed. (If the selected code segment is at a different privilege level and the code segment is non-conforming, 
a general-protection exception is generated.) A far jump to the same privilege level in protected mode is very 
similar to one carried out in real-address or virtual-8086 mode. The target operand specifies an absolute far 
address either directly with a pointer (ptrl6:16 or ptrl6:32) or indirectly with a memory location (ml6:16 or 
ml6: 32). The operand-size attribute determines the size of the offset (16 or 32 bits) in the far address. The new 
code segment selector and its descriptor are loaded into CS register, and the offset from the instruction is loaded 
into the EIP register. Note that a call gate (described in the next paragraph) can also be used to perform far call to 
a code segment at the same privilege level. Using this mechanism provides an extra level of indirection and is the 
preferred method of making jumps between 16-bit and 32-bit code segments. 

When executing a far jump through a call gate, the segment selector specified by the target operand identifies the 
call gate. (The offset part of the target operand is ignored.) The processor then jumps to the code segment speci¬ 
fied in the call gate descriptor and begins executing the instruction at the offset specified in the call gate. No stack 
switch occurs. Here again, the target operand can specify the far address of the call gate either directly with a 
pointer (ptrl6:16 or ptrl6:32) or indirectly with a memory location (ml6:16 or ml6:32). 

Executing a task switch with the JMP instruction is somewhat similar to executing a jump through a call gate. Here 
the target operand specifies the segment selector of the task gate for the task being switched to (and the offset 
part of the target operand is ignored). The task gate in turn points to the TSS for the task, which contains the 
segment selectors for the task's code and stack segments. The TSS also contains the EIP value for the next instruc¬ 
tion that was to be executed before the task was suspended. This instruction pointer value is loaded into the EIP 
register so that the task begins executing again at this next instruction. 

The JMP instruction can also specify the segment selector of the TSS directly, which eliminates the indirection of the 
task gate. See Chapter 7 in Intel® 64 and IA-32 Architectures Software Developer's Manual, Volume 3A, for 
detailed information on the mechanics of a task switch. 
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Note that when you execute at task switch with a JMP instruction, the nested task flag (NT) is not set in the EFLAGS 
register and the new TSS's previous task link field is not loaded with the old task's TSS selector. A return to the 
previous task can thus not be carried out by executing the IRET instruction. Switching tasks with the JMP instruc¬ 
tion differs in this regard from the CALL instruction which does set the NT flag and save the previous task link infor¬ 
mation, allowing a return to the calling task with an IRET instruction. 

I n 64-Bit Mode — The instruction's operation size is fixed at 64 bits. If a selector points to a gate, then RIP equals 
the 64-bit displacement taken from gate; else RIP equals the zero-extended offset from the far pointer referenced 
in the instruction. 

See the summary chart at the beginning of this section for encoding data and limits. 

Operation 

IF near jump 
IF 64-blt Mode 
THEN 

IF near relative Jump 
THEN 

tempRIP RIP + DEST; (* RIP Is Instruction following JMP instruction*) 

ELSE (* Near absolute jump *) 
tempRIP DEST; 

FI; 

ELSE 

IF near relative jump 
THEN 

tempElP EIP + DEST; (* EIP is instruction following JMP instruction*) 

ELSE (* Near absolute jump *) 
tempElP DEST; 

FI; 

FI; 

IF (IA32_EFER.LMA = 0 or target mode = Compatibility mode) 
and tempElP outside code segment limit 
THEN #CP(0); FI 

IF 64-bit mode and tempRIP is not canonical 
THEN #GP(0); 

FI; 

IF OperandSize = 32 
THEN 

EIP tempElP; 

ELSE 

IF OperandSize= 16 

THEN (* OperandSize = 16*) 

EIP ^ tempElP AND OOOOFFFFH; 

ELSE (* OperandSize = 64) 

RIP tempRIP; 

FI; 

FI; 

FI; 

IF far jump and (PE = 0 or (PE = 1 AND VM = 1)) (* Real-address or virtual-8086 mode *) 

THEN 

tempElP ^ DEST(Offset); (* DEST is ptrl 6:32 or [m16:32\ *) 

IF tempElP is beyond code segment limit 
THEN #GP(0); FI; 

CS ^ DEST(segment selector); (* DEST is ptrl6:32or [ml6:32] *) 

IF OperandSize = 32 


3-490 Vol. 2A 


JMP—jump 


INSTRUCTION SET REFERENCE, A-L 


THEN 

EIP ^ tempElP; (* DEST is ptr16:32or [ml6:32] *) 

ELSE (* OperandSIze =16*) 

EIP ^ tempElP AND OOOOFFFFH; (* Clear upper 16 bits *) 

FI; 

FI; 

IF far jump and (PE = 1 and VM = 0) 

(* IA-32e mode or protected mode, not virtual-8086 mode *) 

THEN 

IF effective address in the CS, DS, ES, FS, GS, or SS segment is illegal 
or segment selector in target operand NULL 
THEN #GP(0); FI; 

IF segment selector index not within descriptor table limits 
THEN #GP(new selector); FI; 

Read type and access rights of segment descriptor; 

IF(EFER.LMA = 0) 

THEN 

IF segment type is not a conforming or nonconforming code 
segment, call gate, task gate, or TSS 
THEN #GP(segment selector); FI; 

ELSE 

IF segment type is not a conforming or nonconforming code segment 
call gate 

THEN #GP(segment selector); FI; 

FI; 

Depending on type and access rights: 

GO TO CONFORMING-CODE-SEGMENT; 

GO TO NONCONFORMING-CODE-SEGMENT; 

GO TO CALL-GATE; 

GO TO TASK-GATE; 

GO TO TASK-STATE-SEGMENT; 

ELSE 

#GP(segment selector); 

FI; 

CONFORMING-CODE-SEGMENT: 

IF L-Blt= 1 and D-BIT= 1 and IA32_EFER.LMA = 1 
THEN GP(new code segment selector); FI; 

IFDPL>CPL 

THEN #GP(segment selector); FI; 

IF segment not present 

THEN #NP(segment selector); FI; 
tempElP ^ DEST(Offset); 

IF OperandSize= 16 

THEN tempElP ^ tempElP AND OOOOFFFFH; 

FI; 

IF (IA32_EFER.LMA = 0 or target mode = Compatibility mode) and 
tempElP outside code segment limit 
THEN #GP(0); FI 
IF tempElP Is non-canonical 
THEN #GP(0); FI; 

CS <- DEST[segment selector]; (* Segment descriptor Information also loaded *) 
CS(RPL) ^ CPL 
EIP <- tempElP; 

END; 
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NONCONFORMING-CODE-SEGMENT: 

IF L-BIt = 1 and D-BIT = 1 and IA32_EFER.LMA = 1 
TFIEN GP(new code segment selector); FI; 

IF(RPL>CPL) OR (DPL?iCPL) 

TFIEN #GP(code segment selector); FI; 

IF segment not present 

TFIEN #NP(segment selector); FI; 
tempElP ^ DEST(Offset); 

IF OperandSlze= 16 

THEN tempElP ^ tempElP AND OOOOFFFFH; FI; 

IF (IA32_EFER.LMA = 0 OR target mode = Compatibility mode) 
and tempElP outside code segment limit 
THEN #CP(0); FI 

IF tempElP is non-canonical THEN #GP(0); FI; 

CS DEST[segment selector]; (* Segment descriptor information also loaded *) 
CS(RPL) ^ CPL; 

EIP tempElP; 

END; 

CALL-GATE: 

IF call gate DPL < CPL 

or call gate DPL < call gate segment-selector RPL 
THEN #GP(call gate selector); FI; 

IF call gate not present 

THEN #NP(call gate selector); FI; 

IF call gate code-segment selector is NULL 
THEN #GP(0); FI; 

IF call gate code-segment selector index outside descriptor table limits 
THEN #CP(code segment selector); FI; 

Read code segment descriptor; 

IF code-segment segment descriptor does not indicate a code segment 
or code-segment segment descriptor is conforming and DPL > CPL 
or code-segment segment descriptor is non-conforming and DPL ^ CPL 
THEN #GP(code segment selector); FI; 

IF IA32_EFER.LMA = 1 and (code-segment descriptor is not a 64-bit code segment 
or code-segment segment descriptor has both L-Bit and D-bit set) 

THEN #GP(code segment selector); FI; 

IF code segment is not present 

THEN #NP(code-segment selector); FI; 

IF instruction pointer is not within code-segment limit 
THEN #GP(0); FI; 
tempElP ^ DEST(Offset); 

IF GateSize = 16 

THEN tempElP ^ tempElP AND OOOOFFFFH; FI; 

IF (IA32_EFER.LMA = 0 OR target mode = Compatibility mode) AND tempElP 
outside code segment limit 
THEN #GP(0); FI 

CS DEST[SegmentSelector); (* Segment descriptor information also loaded *) 
CS(RPL) ^ CPL; 

EIP tempElP; 

END; 

TASK-GATE: 

IF task gate DPL < CPL 

or task gate DPL < task gate segment-selector RPL 
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THEN #GP(task gate selector); FI; 

IF task gate not present 

THEN #NP(gate selector); FI; 

Read the TSS segment selector In the task-gate descriptor; 

IF TSS segment selector local/global bit Is set to local 
or Index not within GOT limits 
or TSS descriptor specifies that the TSS is busy 
THEN #GP(TSS selector); FI; 

IF TSS not present 

THEN #NP(TSS selector); FI; 

SWITCH-TASKS to TSS; 

IF EIP not within code segment limit 
THEN #GP(0); FI; 

END; 

TASK-STATE-SEGMENT: 

IF TSS DPL<CPL 

or TSS DPI < TSS segment-selector RPL 
or TSS descriptor indicates TSS not available 
THEN #GP(TSS selector); FI; 

IF TSS Is not present 

THEN #NP(TSS selector); FI; 

SWITCH-TASKS to TSS; 

IF EIP not within code segment limit 
THEN #GP(0); FI; 

END; 

Flags Affected 

All flags are affected if a task switch occurs; no flags are affected if a task switch does not occur. 

Protected Mode Exceptions 

#GP(0) If offset in target operand, call gate, or TSS is beyond the code segment limits. 

If the segment selector in the destination operand, call gate, task gate, or TSS is NULL. 

If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

If the DS, ES, FS, or GS register is used to access memory and it contains a NULL segment 
selector. 

#GP(selector) If the segment selector index is outside descriptor table limits. 

If the segment descriptor pointed to by the segment selector in the destination operand is not 
for a conforming-code segment, nonconforming-code segment, call gate, task gate, or task 
state segment. 

If the DPL for a nonconforming-code segment is not equal to the GPL 

(When not using a call gate.) If the RPL for the segment's segment selector is greater than the 
GPL. 

If the DPL for a conforming-code segment is greater than the GPL. 

If the DPL from a call-gate, task-gate, or TSS segment descriptor is less than the GPL or than 
the RPL of the call-gate, task-gate, or TSS's segment selector. 

If the segment descriptor for selector in a call gate does not indicate it is a code segment. 

If the segment descriptor for the segment selector in a task gate does not indicate an available 
TSS. 

If the segment selector for a TSS has its local/global bit set for local. 

If a TSS segment descriptor specifies that the TSS is busy or not available. 

#SS(0) If a memory operand effective address is outside the SS segment limit. 


JMP—Jump 
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#NP (selector) 

If the code segment being accessed is not present. 

If call gate, task gate, or TSS not present. 

#PF(fault-code) 

#AC(0) 

If a page fault occurs. 

If alignment checking is enabled and an unaligned memory reference is made while the 
current privilege level is 3. (Only occurs when fetching target from memory.) 

#UD 

If the LOCK prefix is used. 


Real-Address Mode Exceptions 


#GP 

If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

#SS 

#UD 

If a memory operand effective address is outside the SS segment limit. 

If the LOCK prefix is used. 


Virtual-SOSe Mode Exceptions 


#GP(0) 

If the target operand is beyond the code segment limits. 

If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

#SS(0) 

#PF(fault-code) 

#AC(0) 

If a memory operand effective address is outside the SS segment limit. 

If a page fault occurs. 

If alignment checking is enabled and an unaligned memory reference is made. (Only occurs 
when fetching target from memory.) 

#UD 

If the LOCK prefix is used. 


Compatibility Mode Exceptions 

Same as 64-bit mode exceptions. 

e4-Bit Mode Exceptions 

#GP(0) If a memory address is non-canonical 


#GP(selector) 

If target offset in destination operand is non-canonical. 

If target offset in destination operand is beyond the new code segment limit. 

If the segment selector in the destination operand is NULL. 

If the code segment selector in the 64-bit gate is NULL. 

If the code segment or 64-bit call gate is outside descriptor table limits. 

If the code segment or 64-bit call gate overlaps non-canonical space. 

If the segment descriptor from a 64-bit call gate is in non-canonical space. 

If the segment descriptor pointed to by the segment selector in the destination operand is not 
for a conforming-code segment, nonconforming-code segment, 64-bit call gate. 

If the segment descriptor pointed to by the segment selector in the destination operand is a 
code segment, and has both the D-bit and the L-bit set. 

If the DPL for a nonconforming-code segment is not equal to the GPL, or the RPL for the 
segment's segment selector is greater than the GPL. 

If the DPL for a conforming-code segment is greater than the GPL. 

If the DPL from a 64-bit call-gate is less than the GPL or than the RPL of the 64-bit call-gate. 

If the upper type field of a 64-bit call gate is not 0x0. 

If the segment selector from a 64-bit call gate is beyond the descriptor table limits. 

If the code segment descriptor pointed to by the selector in the 64-bit gate doesn't have the L- 
bit set and the D-bit clear. 

If the segment descriptor for a segment selector from the 64-bit call gate does not indicate it 
is a code segment. 

If the code segment is non-conforming and GPL DPL. 
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#NP(selector) 

#UD 

#PF(fault-code) 

#AC(0) 


If the code segment is confirming and CPL < DPL. 

If a code segment or 64-bit call gate is not present. 

(64-bit mode only) If a far jump is direct to an absolute address in memory. 

If the LOCK prefix is used. 

If a page fault occurs. 

If alignment checking is enabled and an unaligned memory reference is made while the 
current privilege level is 3. 
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KADDW/KADDB/KADDQ/KADDD-ADD Two Masks 


Opcode/ 

Instruction 

Op/En 

64/32 
bit Mode 
Support 

CPUID 

Feature 

Flag 

Description 

VEX.L1 .OF.WO 4A /r 

KADDW k1,k2, k3 

RVR 

V/V 

AVX512DQ 

Add 16 bits masks in k2 and k3 and place result in kl. 

VEX.L1.66.0F.W0 4A /r 

KADDB k1,k2, k3 

RVR 

v/v 

AVX512DQ 

Add B bits masks in k2 and k3 and place result in kl. 

VEX.L1.0F.W1 4A /r 

KADDQ k1,k2, k3 

RVR 

V/V 

AVX512BW 

Add 64 bits masks in k2 and k3 and place result in kl. 

VEX.L1.66.0F.W1 4A /r 

KADDD k1,k2, k3 

RVR 

v/v 

AVX512BW 

Add 32 bits masks in k2 and k3 and place result in kl. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

RVR 

ModRM:reg (w) 

VEX.Ivvv (r) 

ModRM:r/m (r, ModRM:[7:6] must be 11 b) 


Description 

Adds the vector mask k2 and the vector mask k3, and writes the result into vector mask kl. 

Operation 

KADDW 

DEST[15:0] ^ SRC1 [15:0] + SRC2[15:0] 

DEST[MAX_KL-1:16]^0 

KADDB 

DEST[7:0] ^ SRC1 [7:0] + SRC2[7:0] 

DEST[MAX_KL-1:8] ^ 0 

KADDQ 

DEST[63:0] ^ SRC1 [63:0] + SRC2[63:0] 

DEST[MAX_KL-1:64] ^ 0 

KADDD 

DEST[31:0] ^ SRC1 [31:0] + SRC2[31:0] 

DEST[MAX_KL-1:32] ^0 

Intel C/C++ Compiler Intrinsic Equivalent 

SIMD Floating-Point Exceptions 

None 

Other Exceptions 

See Exceptions Type K20. 
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KANDW/KANDB/KANDQ/KANDD-Bitwise Logical AND Masks 


Opcode/ 

Instruction 

Op/En 

64/32 
bit Mode 
Support 

CPUID 

Feature 

Flag 

Description 

VEX.NDS.L1.0F.W0 41 /r 
KANDW k1,k2, k3 

RVR 

V/V 

AVX512F 

Bitwise AND 16 bits masks k2 and k3 and place result in kl. 

VEX.L1.66.0F.W0 41 /r 

KANDB k1,k2, k3 

RVR 

v/v 

AVX512DQ 

Bitwise AND 8 bits masks k2 and k3 and place result in kl. 

VEX.L1.0F.W1 41 /r 

KANDQ k1,k2, k3 

RVR 

V/V 

AVX512BW 

Bitwise AND 64 bits masks k2 and k3 and place result in kl. 

VEX.L1.66.0F.W1 41 /r 

KANDD k1,k2, k3 

RVR 

v/v 

AVX512BW 

Bitwise AND 32 bits masks k2 and k3 and place result in kl. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

RVR 

ModRM:reg (w) 

VEX.Ivvv (r) 

ModRM:r/m (r, ModRM:[7:6] must be 11 b) 


Description 

Performs a bitwise AND between the vector mask k2 and the vector mask k3, and writes the result into vector mask 
kl. 

Operation 

KANDW 

DEST[15:0] ^ SRC1 [15:0] BITWISE AND SRC2[15:0] 

DEST[MAX_KL-1:16]^0 

KANDB 

DEST[7:0] ^ SRC1 [7:0] BITWISE AND SRC2[7:0] 

DEST[MAX_KL-1:8]^0 


KANDQ 

DEST[63:0] ^ SRC1[63:0] BITWISE AND SRC2[63:0] 
DEST[MAX_KL-1:64]^0 

KANDD 

DEST[31:0] ^ SRC1 [31:0] BITWISE AND SRC2[31:0] 
DEST[MAX_KL-1:32]^0 

Intel C/C++ Compiler Intrinsic Equivalent 

KANDW_mmaski 6 _mm512_kand(_mmaski 6 a,_mmaski 6 b); 

Flags Affected 

None 

SIMD Floating-Point Exceptions 

None 

Other Exceptions 

See Exceptions Type K20. 
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KANDNW/KANDNB/KANDNQ 

l/KANDND-Bitwise Logical AND NOT Masks 

Opcode/ 

Instruction 

Op/En 

64/32 
bit Mode 
Support 

CPUID 

Feature 

Flag 

Description 

VEX.NDS.L1 .OF.WO 42 /r 
KANDNW k1,k2, k3 

RVR 

V/V 

AVX512F 

Bitwise AND NOT 16 bits masks k2 and k3 and place result in 
kl. 

VEX.L1.66.0F.W0 42 /r 
KANDNB k1,k2, k3 

RVR 

v/v 

AVX512DQ 

Bitwise AND NOT 8 bits masks kl and k2 and place result in kl. 

VEX.L1.0F.W1 42 /r 
KANDNQ k1,k2, k3 

RVR 

V/V 

AVX512BW 

Bitwise AND NOT 64 bits masks k2 and k3 and place result in 
kl. 

VEX.L1.66.0F.W1 42 /r 
KANDND k1,k2, k3 

RVR 

v/v 

AVX512BW 

Bitwise AND NOT 32 bits masks k2 and k3 and place result in 
kl. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

RVR 

ModRM:reg (w) 

VEX.Ivvv (r) 

ModRM:r/m (r, ModRM:[7:6] must be 11 b) 


Description 

Performs a bitwise AND NOT between the vector mask k2 and the vector mask k3, and writes the result into vector 
mask kl. 

Operation 

KANDNW 

DEST[15:0] ^ (BITWISE NOT SRC1 [15:0]) BITWISE AND SRC2[15:0] 

DEST[MAX_KL-1:16]^0 

KANDNB 

DEST[7:0] ^ (BITWISE NOT SRC1 [7:0]) BITWISE AND SRC2[7:0] 

DEST[MAX_KL-1:8] ^ 0 

KANDNQ 

DEST[63:0] ^ (BITWISE NOT SRC1 [63:0]) BITWISE AND SRC2[63:0] 

DEST[MAX_KL-1:64] ^ 0 

KANDND 

DEST[31:0] ^ (BITWISE NOT SRC1 [31:0]) BITWISE AND SRC2[31:0] 

DEST[MAX_KL-1:32]^0 

Intel C/C++ Compiler Intrinsic Equivalent 

KANDNW_mmaski 6 _mm512_kandn(_mmaski 6 a,_mmaski 6 b); 

Flags Affected 

None 

SIMD Floating-Point Exceptions 

None 

Other Exceptions 

See Exceptions Type K20. 
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KMOVW/KMOVB/KMOVQ/KMOVD-Move from and to Mask Registers 


Opcode/ 

Instruction 

Op/En 

64/32 
bit Mode 
Support 

CPUID 

Feature 

Flag 

Description 

VEX.LO.OF.WO 90 /r 
KM0VWk1,k2/m16 

RM 

V/V 

AVX512F 

Move 16 bits mask from k2/m16 and store the result in k1. 

VEX.L0.66.0F.W0 90 /r 
KM0VBk1,k2/m8 

RM 

v/v 

AVX512DQ 

Move 8 bits mask from k2/m8 and store the result in k1. 

VEX.L0.0F.W1 90 /r 
KM0VQk1,k2/m64 

RM 

V/V 

AVX512BW 

Move 64 bits mask from k2/m64 and store the result in k1. 

VEX.L0.66.0F.W1 90 /r 
KMOVDk1,k2/m32 

RM 

v/v 

AVX512BW 

Move 32 bits mask from k2/m32 and store the result in k1. 

VEX.LO.OF.WO 91 /r 
KM0VWm16,k1 

MR 

v/v 

AVX512F 

Move 16 bits mask from k1 and store the result in ml 6. 

VEX.L0.66.0F.W0 91 /r 

KMOVB mB, k1 

MR 

v/v 

AVX512DQ 

Move 8 bits mask from k1 and store the result in mB. 

VEX.L0.0F.W1 91 /r 
KM0VQm64,k1 

MR 

v/v 

AVX512BW 

Move 64 bits mask from k1 and store the result in m64. 

VEX.L0.66.0F.W1 91 /r 

KMOVD m32, k1 

MR 

v/v 

AVX512BW 

Move 32 bits mask from k1 and store the result in m32. 

VEX.LO.OF.WO 92 /r 
KMOVWk1,r32 

RR 

v/v 

AVX512F 

Move 16 bits mask from r32 to k1. 

VEX.L0.66.0F.W0 92 /r 

KMOVB k1,r32 

RR 

v/v 

AVX512DQ 

Move 8 bits mask from r32 to k1. 

VEX.L0.F2.0F.W1 92 /r 

KMOVQ k1, r64 

RR 

V/l 

AVX512BW 

Move 64 bits mask from r64 to k1. 

VEX.L0.F2.0F.W0 92 /r 

KMOVD k1,r32 

RR 

v/v 

AVX512BW 

Move 32 bits mask from r32 to k1. 

VEX.LO.OF.WO 93 /r 

KMOVW r32, k1 

RR 

v/v 

AVX512F 

Move 16 bits mask from k1 to r32. 

VEX.L0.66.0F.W0 93 /r 

KMOVB r32, k1 

RR 

v/v 

AVX512DQ 

Move 8 bits mask from k1 to r32. 

VEX.L0.F2.0F.W1 93 /r 

KMOVQ r64,k1 

RR 

v/l 

AVX512BW 

Move 64 bits mask from k1 to r64. 

VEX.L0.F2.0F.W0 93 /r 

KMOVD r32, k1 

RR 

v/v 

AVX512BW 

Move 32 bits mask from k1 to r32. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

RM 

ModRM:reg (w) 

ModRM:r/m (r) 

MR 

ModRM:r/m (w, ModRM:[7:6] must not be 11 b) 

ModRM:reg (r) 

RR 

ModRM:reg (w) 

ModRM:r/m (r, ModRM:[7:6] must be 11 b) 


Description 

Copies values from the source operand (second operand) to the destination operand (first operand). The source 
and destination operands can be mask registers, memory location or general purpose. The instruction cannot be 
used to transfer data between general purpose registers and or memory locations. 

When moving to a mask register, the result is zero extended to MAX_KL size (i.e., 64 bits currently). When moving 
to a general-purpose register (GPR), the result is zero-extended to the size of the destination. In 32-bit mode, the 
default GPR destination's size is 32 bits. In 64-bit mode, the default GPR destination's size is 64 bits. Note that 
REX.W cannot be used to modify the size of the general-purpose destination. 
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Operation 

KMOVW 

IF *destination is a memory location* 

DEST[15:0]^SRC[15:0] 

IF *destination is a mask register or a GPR * 

DEST ^ ZeroExtension(SRC[15:0]) 

KMOVB 

IF *destination is a memory location* 

DEST[7:0] ^ SRC[7:0] 

IF *destination is a mask register or a GPR * 

DEST ^ ZeroExtension(SRC[7:0]) 

KMOVQ 

IF *destination is a memory location or a GPR* 

DEST[63:0] ^ SRC[63:0] 

IF *destination is a mask register* 

DEST ^ ZeroExtension(SRC[63:0]) 

KMOVD 

IF *destination is a memory location* 

DEST[31:0]^SRC[31:0] 

IF *destination is a mask register or a GPR * 

DEST ^ ZeroExtension(SRC[31:0]) 

Intel C/C++ Compiler Intrinsic Equivalent 

KMOVW_mmaski 6 _mm51 Z_kmov(_mmaski 6 a); 

Flags Affected 

None 

SIMD Floating-Point Exceptions 

None 

Other Exceptions 

Instructions with RR operand encoding See Exceptions Type K20. 

Instructions with RM or MR operand encoding See Exceptions Type K21. 
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KNOTW/KNOTB/KNOTQ/KNOTD-NOT Mask Register 


Opcode/ 

Instruction 

Op/En 

64/32 
bit Mode 
Support 

CPUID 

Feature Flag 

Description 

VEX.LO.OF.WO 44 /r 

KNOTW k1,k2 

RR 

V/V 

AVX512F 

Bitwise NOT of 16 bits mask k2. 

VEX.L0.66.0F.W0 44 /r 

KNOTB k1,k2 

RR 

v/v 

AVX512DQ 

Bitwise NOT of 8 bits mask k2. 

VEX.L0.0F.W1 44 /r 

KNOTQ k1,k2 

RR 

V/V 

AVX512BW 

Bitwise NOT of 64 bits mask k2. 

VEX.L0.66.0F.W1 44 /r 

KNOTD k1,k2 

RR 

v/v 

AVX512BW 

Bitwise NOT of 32 bits mask k2. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

RR 

ModRM:reg (w) 

ModRM:r/m (r, ModRM:[7:6] must be 11 b) 


Description 

Performs a bitwise NOT of vector mask k2 and writes the result into vector mask kl. 

Operation 

KNOTW 

DEST[15:0] ^ BITWISE NOT SRC[15:0] 

DEST[MAX_KL-1:16]^0 

KNOTB 

DEST[7:0] ^ BITWISE NOT SRC[7:0] 

DEST[MAX_KL-1:8]^0 

KNOTQ 

DEST[63:0] ^ BITWISE NOT SRC[63:0] 

DEST[MAX_KL-1:64]^0 

KNOTD 

DEST[31:0] ^ BITWISE NOT SRC[31:0] 

DEST[MAX_KL-1:32]^0 

Intel C/C++ Compiler Intrinsic Equivalent 

KNOTW_mmaski 6 _mm512_knot(_mmaski 6 a); 

Flags Affected 

None 

SIMD Floating-Point Exceptions 

None 

Other Exceptions 

See Exceptions Type K20. 
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KORW/KORB/KORQ/KORD-Bitwise Logical OR Masks 


Opcode/ 

Instruction 

Op/En 

64/32 
bit Mode 
Support 

CPUID 

Feature 

Flag 

Description 

VEX.NDS.L1 .OF.WO 45 /r 
K0RWk1,k2, k3 

RVR 

V/V 

AVX512F 

Bitwise OR 16 bits masks k2 and k3 and place result in kl. 

VEX.L1.66.0F.W0 45 /r 
K0RBk1,k2, k3 

RVR 

v/v 

AVX512DQ 

Bitwise OR 8 bits masks k2 and k3 and place result in kl. 

VEX.L1.0F.W1 45 /r 

K0RQk1,k2, k3 

RVR 

V/V 

AVX512BW 

Bitwise OR 64 bits masks k2 and k3 and place result in kl. 

VEX.L1.66.0F.W1 45 /r 
K0RDk1,k2, k3 

RVR 

v/v 

AVX512BW 

Bitwise OR 32 bits masks k2 and k3 and place result in kl. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

RVR 

ModRM:reg (w) 

VEX.Ivvv (r) 

ModRM:r/m (r, ModRM:[7:6] must be 11 b) 


Description 

Performs a bitwise OR between the vector mask k2 and the vector mask k3, and writes the result into vector mask 
kl (three-operand form). 

Operation 

KORW 

DEST[15:0] ^ SRC1 [15:0] BITWISE OR SRC2[15:0] 

DEST[MAX_KL-1:16]^0 

KORB 

DEST[7:0] ^ SRC1 [7:0] BITWISE OR SRC2[7:0] 

DEST[MAX_KL-1:8] ^ 0 

KORQ 

DEST[63:0] ^ SRC1 [63:0] BITWISE OR SRC2[63:0] 

DEST[MAX_KL-1:64] ^ 0 

KORD 

DEST[31:0] ^ SRC1 [31:0] BITWISE OR SRC2[31:0] 

DEST[MAX_KL-1:32]^0 

Intel C/C++ Compiler Intrinsic Equivalent 

KORW_mmaski 6 _mm512_kor(_mmaski 6 a,_mmaski 6 b); 

Flags Affected 

None 

SIMD Floating-Point Exceptions 

None 

Other Exceptions 

See Exceptions Type K20. 
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KORTESTW/KORT6STB/KORTESTC 

[/KORTESTD-OR Masks And Set Flags 

Opcode/ 

Instruction 

Op/ 

En 

64/32 
bit Mode 
Support 

CPUID 

Feature 

Flag 

Description 

VEX.LO.OF.WO 98 /r 
KORTESTW k1,kZ 

RR 

V/V 

AVX512F 

Bitwise OR 16 bits masks kl and kZ and update ZF and CF accordingly. 

VEX.L0.66.0F.W0 98 /r 
KORTESTB kl,kZ 

RR 

v/v 

AVX512DQ 

Bitwise OR 8 bits masks kl and k2 and update ZF and CF accordingly. 

VEX.L0.0F.W1 98 /r 
KORTESTQk1,k2 

RR 

V/V 

AVX512BW 

Bitwise OR 64 bits masks kl and kZ and update ZF and CF accordingly. 

VEX.L0.66.0F.W1 98 /r 
KORTESTD kl,kZ 

RR 

v/v 

AVX512BW 

Bitwise OR 32 bits masks kl and kZ and update ZF and CF accordingly. 


Instruction Operand 

Encoding 

Op/En 

Operand 1 

Operand 2 

RR 

ModRM:reg (w) 

ModRM:r/m (r, ModRM:[7:6] must be 11 b) 


Description 

Performs a bitwise OR between the vector mask register k2, and the vector mask register kl, and sets CF and ZF 
based on the operation result. 

ZF flag is set if both sources are 0x0. CF is set if, after the OR operation is done, the operation result is all I's. 

Operation 

KORTESTW 

TMP[15:0] ^ DEST[15:0] BITWISE OR SRC[15:0] 

IF(TMP[15:0]=0) 

THEN ZF ^ 1 
ELSE ZF ^ 0 
FI; 

IF(TMP[15:0]=FFFFh) 

THEN CF ^ 1 
ELSE CF ^ 0 
FI; 

KORTESTB 

TMP[7:0] ^ DEST[7:0] BITWISE OR SRC[7:0] 

IF(TMP[7:0]=0) 

THEN ZF ^ 1 
ELSE ZF ^ 0 
FI; 

IF(TMP[7:0]==FFh) 

THEN CF ^ 1 
ELSE CF ^ 0 


FI; 
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KORTESTQ 

TMP[63:0] ^ DEST[63:0] BITWISE OR SRC[63:0] 

IF(TMP[63:0]=0) 

THEN ZF ^ 1 
ELSE ZF ^ 0 

FI; 

IF(TMP[63:0]==FFFFFFFF_FFFFFFFFh) 

THEN CF ^ 1 
ELSE CF ^ 0 

FI; 

KORTESTD 

TMP[31:0] ^ DEST[31:0] BITWISE OR SRC[31:0] 

IF(TMP[31:0]=0) 

THEN ZF ^ 1 
ELSE ZF ^ 0 

FI; 

IF(TMP[31:0]=FFFFFFFFh) 

THEN CF ^ 1 
ELSE CF ^ 0 

FI; 

Intel C/C++ Compiler Intrinsic Equivalent 

KORTESTW_mmaski 6 _mm512_kortest[cz](_mmaski 6 a,_mmaski 6 b); 

Flags Affected 

The ZF flag is set if the result of OR-ing both sources is all Os. 

The CF flag is set if the result of OR-ing both sources is all Is. 

The OF, SF, AF, and PF flags are set to 0. 

Other Exceptions 

See Exceptions Type K20. 
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INSTRUCTION SET REFERENCE, A-L 


KSHIFTLW/KSHIFTLB/KSHIFTLQ/KSHIFTLD-Shift Left Mask Registers 


Opcode/ 

Instruction 

Op/En 

64/32 
bit Mode 
Support 

CPUID 

Feature 

Flag 

Description 

VEX.L0.66.0F3A.W1 32 /r 
KSHIFTLW k1,k2, imm8 

RRI 

V/V 

AVX512F 

Shift left 16 bits in k2 by immediate and write result in k1. 

VEX.L0.66.0F3A.W0 32 /r 
KSHIFTLB k1,k2, imm8 

RRI 

v/v 

AVX512DQ 

Shift left 8 bits in k2 by immediate and write result in k1. 

VEX.L0.66.0F3A.W1 33 /r 
KSHIFTLQ k1,k2, immS 

RRI 

V/V 

AVX512BW 

Shift left 64 bits in k2 by immediate and write result in k1. 

VEX.L0.66.0F3A.W0 33 /r 
KSHIFTLDk1,k2, imm8 

RRI 

v/v 

AVX512BW 

Shift left 32 bits in k2 by immediate and write result in k1. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

RRI 

ModRM:reg (w) 

ModRM:r/m (r, ModRM:[7:6] must be 11 b) 

ImmS 


Description 

Shifts 8/16/32/64 bits in the second operand (source operand) left by the count specified in immediate byte and 
place the least significant 8/16/32/64 bits of the result in the destination operand. The higher bits of the destina¬ 
tion are zero-extended. The destination is set to zero if the count value is greater than 7 (for byte shift), 15 (for 
word shift), 31 (for doubleword shift) or 63 (for quadword shift). 

Operation 

KSHIFTLW 

COUNT ^ imm8[7:0] 

DEST[MAX_KL-1:0]^0 
IF COUNT <=15 

THEN DEST[15:0] ^ SRC1 [15:0] « COUNT; 

FI; 

KSHIFTLB 

COUNT ^ Imm8[7:0] 

DEST[MAX_KL-1:0]^0 
IF COUNT <=7 

THEN DEST[7:0] ^ SRC1 [7:0] « COUNT; 

FI; 

KSHIFTLQ 

COUNT ^ imm8[7:0] 

DEST[MAX_KL-1:0]^0 
IF COUNT <=63 

THEN DEST[63:0] ^ SRC1 [63:0] « COUNT; 

FI; 
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KSHIFTLD 

COUNT ^ imm8[7:0] 

DEST[MAX_KL-1:0] ^ 0 
IF COUNT <=31 

THEN DEST[31:0] ^ SRC1 [31:0] < < COUNT; 

FI; 

Intel C/C++ Compiler Intrinsic Equivalent 

Compiler auto generates KSHIFTLW when needed. 

Flags Affected 

None 

SIMD Floating-Point Exceptions 

None 

Other Exceptions 

See Exceptions Type K20. 
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INSTRUCTION SET REFERENCE, A-L 


KSHIFTRW/KSHIFTRB/KSHIFTRQ/KSHIFTRD-Shift Right Mask Registers 


Opcode/ 

Instruction 

Op/En 

64/32 
bit Mode 
Support 

CPUID 

Feature 

Flag 

Description 

VEX.L0.66.0F3A.W1 30 /r 
KSHIFTRW k1,k2, imm8 

RRI 

V/V 

AVX512F 

Shift right 16 bits in k2 by immediate and write result in k1. 

VEX.L0.66.0F3A.W0 30 /r 
KSHIFTRB k1,k2, imm8 

RRI 

v/v 

AVX512DQ 

Shift right 8 bits in k2 by immediate and write result in k1. 

VEX.L0.66.0F3A.W1 31 /r 
KSHIFTRQ k1,k2, imm8 

RRI 

V/V 

AVX512BW 

Shift right 64 bits in k2 by immediate and write result in k1. 

VEX.L0.66.0F3A.W0 31 /r 
KSHIFTRDk1,k2,imm8 

RRI 

v/v 

AVX512BW 

Shift right 32 bits in k2 by immediate and write result in k1. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

RRI 

ModRM:reg (w) 

ModRM:r/m (r, ModRM:[7:6] must be 11 b) 

Imm8 


Description 

Shifts 8/16/32/64 bits in the second operand (source operand) right by the count specified in immediate and place 
the least significant 8/16/32/64 bits of the result in the destination operand. The higher bits of the destination are 
zero-extended. The destination is set to zero if the count value is greater than 7 (for byte shift), 15 (for word shift), 
31 (for doubleword shift) or 63 (for quadword shift). 

Operation 

KSHIFTRW 

COUNT ^ imm8[7:0] 

DEST[MAX_KL-1:0]^0 
IF COUNT <=15 

THEN DEST[15:0] ^ SRC1 [15:0] » COUNT; 

FI; 

KSHIFTRB 

COUNT ^ imm8[7:0] 

DEST[MAX_KL-1:0]^0 
IF COUNT <=7 

THEN DEST[7:0] ^ SRC1 [7:0] » COUNT; 

FI; 

KSHIFTRQ 

COUNT ^ Imm8[7:0] 

DEST[MAX_KL-1:0]^0 
IF COUNT <=63 

THEN DEST[63:0] ^ SRC1 [63:0] > > COUNT; 

FI; 
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KSHIFTRD 

COUNT ^ Imm8[7:0] 

DEST[MAX_KL-1:0] ^ 0 
IF COUNT <=31 

THEN DEST[31:0] ^ SRC1 [31:0] > > COUNT; 

FI; 

Intel C/C++ Compiler Intrinsic Equivalent 

Compiler auto generates KSHIFTRW when needed. 

Flags Affected 

None 

SIMD Floating-Point Exceptions 

None 

Other Exceptions 

See Exceptions Type K20. 
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INSTRUCTION SET REFERENCE, A-L 


KTESTW/KTESTB/KTESTQ/KTESTD-Packed Bit Test Masks and Set Flags 


Opcode/ 

Instruction 

Op 

En 

64/32 
bit Mode 
Support 

CPUID 

Feature 

Flag 

Description 

VEX.LO.OF.WO 99 /r 
KTESTW k1,k2 

RR 

V/V 

AVX512DQ 

Set ZF and CF depending on sign bit AND and ANDN of 16 bits mask 
register sources. 

VEX.L0.66.0F.W0 99 /r 
KTESTB k1, k2 

RR 

v/v 

AVX512DQ 

Set ZF and CF depending on sign bit AND and ANDN of 8 bits mask 
register sources. 

VEX.L0.0F.W1 99 /r 
KTESTQk1,k2 

RR 

V/V 

AVX512BW 

Set ZF and CF depending on sign bit AND and ANDN of 64 bits mask 
register sources. 

VEX.L0.66.0F.W1 99 /r 
KTESTD k1, k2 

RR 

v/v 

AVX512BW 

Set ZF and CF depending on sign bit AND and ANDN of 32 bits mask 
register sources. 


Instruction Operand Encoding 


Op/En 

Operand 1 

0perand2 

RR 

ModRM:reg (r) 

ModRM:r/m (r, ModRM:[7:6] must be 11 b) 


Description 

Performs a bitwise comparison of the bits of the first source operand and corresponding bits in the second source 
operand. If the AND operation produces all zeros, the ZF is set else the ZF is clear. If the bitwise AND operation of 
the inverted first source operand with the second source operand produces all zeros the CF is set else the CF is 
clear. Only the EFLAGS register is updated. 

Note: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b, otherwise instructions will #UD. 

Operation 

KTESTW 

TEMP[15:0] ^ SRC2[15:0] AND SRC1 [15:0] 

IF (TEMP[15:0] = = 0) 

THEN ZF^I; 

ELSE ZF ^ 0; 

FI; 

TEMP[15:0] ^ SRC2[15:0] AND NOT SRC1 [15:0] 

IF (TEMP[15:0] = = 0) 

THEN CF^I; 

ELSE CF ^ 0; 

FI; 

AF ^ OF ^ PF ^ SF ^ 0; 

KTESTB 

TEMP[7:0] ^ SRC2[7:0] AND SRC1 [7:0] 

IF (TEMP[7:0] = = 0) 

THEN ZF^I; 

ELSE ZF ^ 0; 

FI; 

TEMP[7:0] ^ SRC2[7:0] AND NOT SRC1 [7:0] 

IF (TEMP[7:0] = = 0) 

THEN CF^I; 

ELSE CF ^ 0; 

FI; 

AF ^ OF ^ PF ^ SF ^ 0; 
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KTESTQ 

TEMP[63:0] ^ SRC2[63:0] AND SRC1 [63:0] 

IF (TEMP[63:0] = = 0) 

THEN ZF^I; 

ELSE ZF ^ 0; 

FI; 

TEMP[63:0] ^ SRC2[63:0] AND NOT SRC1 [63:0] 
IF (TEMP[63:0] = = 0) 

THEN CF^I; 

ELSE CF ^ 0; 

FI; 

AF ^ OF ^ PF ^ SF ^ 0; 

KTESTD 

TEMP[31:0] ^ SRC2[31:0] AND SRC1 [31:0] 
IF(TEMP[31:0] = = 0) 

THEN ZF ^1; 

ELSE ZF ^ 0; 

FI; 

TEMP[31:0] ^ SRC2[31:0] AND NOT SRC1 [31:0] 
IF(TEMP[31:0] = = 0) 

THEN CF^I; 

ELSE CF ^ 0; 

FI; 

AF ^ OF ^ PF ^ SF ^ 0; 

Intel C/C++ Compiler Intrinsic Equivalent 


SIMD Floating-Point Exceptions 

None 

Other Exceptions 

See Exceptions Type K20. 
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INSTRUCTION SET REFERENCE, A-L 


KUNPCKBW/KUNPCKWD/KUNPCKDQ-Unpack for Mask Registers 


Opcode/ 

Instruction 

Op/En 

64/32 
bit Mode 
Support 

CPUID 

Feature 

Flag 

Description 

VEX.NDS.L1.66.0F.W0 4B /r 
KUNPCKBW k1,k2, k3 

RVR 

V/V 

AVX512F 

Unpack and interleave 8 bits masks in k2 and k3 and write 
word result in k1. 

VEX.NDS.L1 .OF.WO 4B /r 
KUNPCKWD k1,k2, k3 

RVR 

v/v 

AVX512BW 

Unpack and interleave 16 bits in k2 and k3 and write double- 
word result in k1. 

VEX.NDS.L1.0F.W1 4B/r 
KUNPCKDQ k1,k2, k3 

RVR 

V/V 

AVX512BW 

Unpack and interleave 32 bits masks in k2 and k3 and write 
quadword result in k1. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

RVR 

ModRM:reg (w) 

VEX.Ivvv (r) 

ModRM:r/m (r, ModRM:[7:6] must be 11 b) 


Description 

Unpacks the lower 8/16/32 bits of the second and third operands (source operands) into the low part of the first 
operand (destination operand), starting from the low bytes. The result is zero-extended in the destination. 

Operation 

KUNPCKBW 

DEST[7:0] ^ SRC2[7:0] 

DEST[15:8] ^ SRC1 [7:0] 

DEST[MAX_KL-1:16]^0 

KUNPCKWD 

DEST[15:0] ^ SRC2[15:0] 

DEST[31:16] ^ SRC1[15:0] 

DEST[MAX_KL-1:32]^0 

KUNPCKDQ 

DEST[31:0] ^ SRC2[31:0] 

DEST[63:32] ^ SRC1[31:0] 

DEST[MAX_KL-1:64]^0 

Intel C/C++ Compiler Intrinsic Equivalent 

KUNPCKBW_mmaski 6 _mm512_l<unpackb(_mmaski 6 a,_mmaski 6 b); 

KUNPCKDQ_mmask64 _mm512_kunpackd(_mmask64 a,_mmask64 b); 

KUNPCKWD_mmask32 _mm512_kunpackw(_mmask32 a,_mmask32 b); 

Flags Affected 

None 

SIMD Floating-Point Exceptions 

None 

Other Exceptions 

See Exceptions Type K20. 
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KXNORW/KXNORB/KXNORQ/KXNORD-Bitwise Logical XNOR Masks 


Opcode/ 

Instruction 

Op/En 

64/32 
bit Mode 
Support 

CPUID 

Feature 

Flag 

Description 

VEX.NDS.L1 .OF.WO 46 /r 
KXNORW k1,k2, k3 

RVR 

V/V 

AVX512F 

Bitwise XNOR 16 bits masks k2 and k3 and place result in kl. 

VEX.L1.66.0F.W0 46 /r 
KXNORB k1,k2, k3 

RVR 

v/v 

AVX512DQ 

Bitwise XNOR B bits masks k2 and k3 and place result in kl. 

VEX.L1.0F.W1 46 /r 

KXNORQ k1,k2, k3 

RVR 

V/V 

AVX512BW 

Bitwise XNOR 64 bits masks k2 and k3 and place result in kl. 

VEX.L1.66.0F.W1 46 /r 
KXNORD k1,k2, k3 

RVR 

v/v 

AVX512BW 

Bitwise XNOR 32 bits masks k2 and k3 and place result in kl. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

RVR 

ModRM:reg (w) 

VEX.Ivvv (r) 

ModRM:r/m (r, ModRM:[7:6] must be 11 b) 


Description 

Performs a bitwise XNOR between the vector mask k2 and the vector mask k3, and writes the result into vector 
mask kl (three-operand form). 

Operation 

KXNORW 

DEST[15:0] ^ NOT (SRC1 [15:0] BITWISE XOR SRC2[15:0]) 

DEST[MAX_KL-1:16]^0 

KXNORB 

DEST[7:0] ^ NOT (SRC1 [7:0] BITWISE XOR SRC2[7:0]) 

DEST[MAX_KL-1:8] ^ 0 

KXNORQ 

DEST[63:0] ^ NOT (SRC1[63:0] BITWISE XOR SRC2[63:0]) 

DEST[MAX_KL-1:64] ^ 0 

KXNORD 

DEST[31:0] ^ NOT (SRC1 [31:0] BITWISE XOR SRC2[31:0]) 

DEST[MAX_KL-1:32]^0 

Intel C/C++ Compiler Intrinsic Equivalent 

KXNORW_mmaski 6 _mm512_kxnor(_mmaski 6 a,_mmaski 6 b); 

Flags Affected 

None 

SIMD Floating-Point Exceptions 

None 

Other Exceptions 

See Exceptions Type K20. 
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KXORW/KXORB/KXORQ/KXORD-Bitwise Logical XOR Masks 


Opcode/ 

Instruction 

Op/En 

64/32 
bit Mode 
Support 

CPUID 

Feature 

Flag 

Description 

VEX.NDS.L1 .OF.WO 47 /r 
KXORW k1,k2, k3 

RVR 

V/V 

AVX512F 

Bitwise XOR 16 bits masks k2 and k3 and place result in kl. 

VEX.L1.66.0F.W0 47 /r 

KXORB k1,k2, k3 

RVR 

v/v 

AVX512DQ 

Bitwise XOR 8 bits masks k2 and k3 and place result in kl. 

VEX.L1.0F.W1 47 /r 

KXORQ k1,k2, k3 

RVR 

V/V 

AVX512BW 

Bitwise XOR 64 bits masks k2 and k3 and place result in kl. 

VEX.L1.66.0F.W1 47 /r 

KXORD k1,k2, k3 

RVR 

v/v 

AVX512BW 

Bitwise XOR 32 bits masks k2 and k3 and place result in kl. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

RVR 

ModRM:reg (w) 

VEX.Ivvv (r) 

ModRM:r/m (r, ModRM:[7:6] must be 11 b) 


Description 

Performs a bitwise XOR between the vector mask k2 and the vector mask k3, and writes the result into vector mask 
kl (three-operand form). 

Operation 

KXORW 

DEST[15:0] ^ SRC1 [15:0] BITWISE XOR SRC2[15:0] 

DEST[MAX_KL-1:16]^0 

KXORB 

DEST[7:0] ^ SRC1 [7:0] BITWISE XOR SRC2[7:0] 

DEST[MAX_KL-1:8]^0 

KXORQ 

DEST[63:0] ^ SRC1 [63:0] BITWISE XOR SRC2[63:0] 

DEST[MAX_KL-1:64]^0 

KXORD 

DEST[31:0] ^ SRC1 [31:0] BITWISE XOR SRC2[31:0] 

DEST[MAX_KL-1:32]^0 

Intel C/C++ Compiler Intrinsic Equivalent 

KXORW_mmask16_mm512_kxor(_mmasklB a,_mmask16 b); 

Flags Affected 

None 

SIMD Floating-Point Exceptions 

None 

Other Exceptions 

See Exceptions Type K20. 
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LAHF—Load Status Flags into AH Register 


Opcode 

Instruction 

Op/ 

En 

64-Bit 

Mode 

Compat/ 
Leg Mode 

Description 

9F 

LAHF 

NP 

Invalid* 

Valid 

Load: AH ^ EFLAGS(SF:ZF:0:AF:0:PF:1 :CF). 


NOTES; 


*\/alid In specific steppings. See Description section. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

NP 

NA 

NA 

NA 

NA 


Description 

This instruction executes as described above in compatibility mode and legacy mode. It is valid in 64-bit mode only 
if CPUID.80000001H:ECX.LAHF-SAHF[bit0] = 1. 

Operation 

IF 64-Blt Mode 
THEN 

IF CPUID.80000001 H:ECX.LAHF-SAHF[bit 0] = 1; 

THEN AH ^ RFLAGS(SF:ZF:0:AF:0:PF:1 :CF); 

ELSE #UD; 

FI; 

ELSE 

AH ^ EFLAGS(SF:ZF:0:AF:0:PF:1 :CF); 

FI; 

Flags Affected 

None. The state of the flags in the EFLAGS register is not affected. 

Protected Mode Exceptions 

#UD If the LOCK prefix is used. 

Real-Address Mode Exceptions 

Same exceptions as in protected mode. 

Virtual-SOSe Mode Exceptions 

Same exceptions as in protected mode. 

Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

e4-Bit Mode Exceptions 

#UD If CPUID.80000001H:ECX.LAHF-SAHF[bit 0] = 0. 

If the LOCK prefix is used. 
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LAR—Load Access Rights Byte 


Opcode 

Instruction 

Op/ 

En 

64-Bit 

Mode 

Compat/ 
Leg Mode 

Description 

OF 02 /r 

lAR r16, r16/m 16 

RM 

Valid 

Valid 

r16<- access rights referenced by r16/m16 

OF 02 /r 

LAR reg, r32/ml6' 

RM 

Valid 

Valid 

reg <- access rights referenced by r32/m 16 


NOTES: 


1. For all loads (regardless of source or destination sizing) only bits 16-0 are used. Other bits are ignored. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RM 

ModRM:reg (w) 

ModRM:r/m (r) 

NA 

NA 


Description 

Loads the access rights from the segment descriptor specified by the second operand (source operand) into the 
first operand (destination operand) and sets the ZF flag in the flag register. The source operand (which can be a 
register or a memory location) contains the segment selector for the segment descriptor being accessed. If the 
source operand is a memory address, only 16 bits of data are accessed. The destination operand is a general- 
purpose register. 

The processor performs access checks as part of the loading process. Once loaded in the destination register, soft¬ 
ware can perform additional checks on the access rights information. 

The access rights for a segment descriptor include fields located in the second doubleword (bytes 4-7) of the 
segment descriptor. The following fields are loaded by the LAR instruction: 

• Bits 7:0 are returned as 0 

• Bits 11:8 return the segment type. 

• Bit 12 returns the S flag. 

• Bits 14:13 return the DPL. 

• Bit 15 returns the P flag. 

• The following fields are returned only if the operand size is greater than 16 bits: 

— Bits 19:16 are undefined. 

— Bit 20 returns the software-available bit in the descriptor. 

— Bit 21 returns the L flag. 

— Bit 22 returns the D/B flag. 

— Bit 23 returns the G flag. 

— Bits 31:24 are returned as 0. 

This instruction performs the following checks before it loads the access rights in the destination register: 

• Checks that the segment selector is not NULL. 

• Checks that the segment selector points to a descriptor that is within the limits of the GDT or LDT being 
accessed 

• Checks that the descriptor type is valid for this instruction. All code and data segment descriptors are valid for 
(can be accessed with) the LAR instruction. The valid system segment and gate descriptor types are given in 
Table 3-52. 

• If the segment is not a conforming code segment, it checks that the specified segment descriptor is visible at 
the CPL (that is, if the CPL and the RPL of the segment selector are less than or equal to the DPL of the segment 
selector). 

If the segment descriptor cannot be accessed or is an invalid type for the instruction, the ZF flag is cleared and no 
access rights are loaded in the destination operand. 
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The LAR instruction can only be executed in protected mode and IA-32e mode. 


Table 3-52. Segment and Gate Types 


Type 

Protected Mode 

IA-32e Mode 

Name 

Valid 

Name 

Valid 

0 

Reserved 

No 

Reserved 

No 

1 

Available 16-bit TSS 

Yes 

Reserved 

No 

2 

LDT 

Yes 

LDT 

No 

3 

Busy 16-bit TSS 

Yes 

Reserved 

No 

4 

16-bit call gate 

Yes 

Reserved 

No 

5 

16-bit/32-bit task gate 

Yes 

Reserved 

No 

6 

16-bit interrupt gate 

No 

Reserved 

No 

7 

16-bit trap gate 

No 

Reserved 

No 

8 

Reserved 

No 

Reserved 

No 

9 

Available 3Z-bit TSS 

Yes 

Available 64-bit TSS 

Yes 

A 

Reserved 

No 

Reserved 

No 

B 

Busy 32-bit TSS 

Yes 

Busy 64-bit TSS 

Yes 

C 

32-bit call gate 

Yes 

64-bit call gate 

Yes 

D 

Reserved 

No 

Reserved 

No 

E 

32-bit interrupt gate 

No 

64-bit interrupt gate 

No 

F 

32-bit trap gate 

No 

64-bit trap gate 

No 


Operation 

IF Offset(SRC) > descriptor table limit 
THEN 

ZF^O; 

ELSE 

SegmentDescriptor <- descriptor referenced by SRC; 

IF SegmentDescrlptor(Type) ^ conforming code segment 
and (CPL > DPL) or (RPL > DPL) 
or SegmentDescrlptor(Type) Is not valid for instruction 
THEN 

ZF^O; 

ELSE 

DEST <- access rights from SegmentDescriptor as given in Description section; 

ZF^ 1; 

FI; 

FI; 

Flags Affected 

The ZF flag is set to 1 if the access rights are loaded successfully; otherwise, it is cleared to 0. 
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Protected Mode Exceptions 

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 


#SS(0) 

#PF(fault-code) 

#AC(0) 

If the DS, ES, FS, or GS register is used to access memory and it contains a NULL segment 
selector. 

If a memory operand effective address is outside the SS segment limit. 

If a page fault occurs. 

If alignment checking is enabled and the memory operand effective address is unaligned while 
the current privilege level is 3. 

#UD 

If the LOCK prefix is used. 


Real-Address Mode Exceptions 

#UD The CAR instruction is not recognized in real-address mode. 

\/irtual-8086 Mode Exceptions 

#UD The CAR instruction cannot be executed in virtual-8086 mode. 

Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

64-Bit Mode Exceptions 

#SS(0) If the memory operand effective address referencing the SS segment is in a non-canonical 


#GP(0) 

#PF(fault-code) 

#AC(0) 

form. 

If the memory operand effective address is in a non-canonical form. 

If a page fault occurs. 

If alignment checking is enabled and the memory operand effective address is unaligned while 
the current privilege level is 3. 

#UD 

If the LOCK prefix is used. 


CAR—Load Access Rights Byte 
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LDDQU—Load Unaligned Integer 128 Bits 


Opcode/ 

Instruction 

Op/ 

En 

64/32-bit 

Mode 

CPUID 

Feature 

Flag 

Description 

F2 OF FO /r 

LDDQU xmml, mem 

RM 

V/V 

SSE3 

Load unaligned data from mem and return 
double quadword in xmml. 

VEX.128.F2.0F.WIG FO /r 

VLDDQU xmml, ml 28 

RM 

v/v 

AVX 

Load unaligned packed integer values from 
mem to xmml. 

VEX.256.F2.0F.WIG FO /r 

VLDDQU ymm1,m256 

RM 

V/V 

AVX 

Load unaligned packed integer values from 
mem to ymmi. 


Instruction Operand 

Encoding 

Qp/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RM 

ModRM:reg (w) 

ModRM:r/m (r) 

NA 

NA 


Description 

The instruction is functionally similar to (V)MOVDQU ymm/xmm, m256/ml28 for loading from memory. That is: 
32/16 bytes of data starting at an address specified by the source memory operand (second operand) are fetched 
from memory and placed in a destination register (first operand). The source operand need not be aligned on a 
32/16-byte boundary. Up to 64/32 bytes may be loaded from memory; this is implementation dependent. 

This instruction may improve performance relative to (V)MOVDQU if the source operand crosses a cache line 
boundary. In situations that require the data loaded by (V)LDDQU be modified and stored to the same location, use 
(V)MOVDQU or (V)MOVDQA instead of (V)LDDQU. To move a double quadword to or from memory locations that 
are known to be aligned on 16-byte boundaries, use the (V)MOVDQA instruction. 

Implementation Notes 

• If the source is aligned to a 32/16-byte boundary, based on the implementation, the 32/16 bytes may be 
loaded more than once. For that reason, the usage of (V)LDDQU should be avoided when using uncached or 
write-combining (WC) memory regions. For uncached or WC memory regions, keep using (V)MOVDQU. 

• This instruction is a replacement for (V)MOVDQU (load) in situations where cache line splits significantly affect 
performance. It should not be used in situations where store-load forwarding is performance critical. If 
performance of store-load forwarding is critical to the application, use (V)MOVDQA store-load pairs when data 
is 256/128-bit aligned or (V)MOVDQU store-load pairs when data is 256/128-bit unaligned. 

• If the memory address is not aligned on 32/16-byte boundary, some implementations may load up to 64/32 
bytes and return 32/16 bytes in the destination. Some processor implementations may issue multiple loads to 
access the appropriate 32/16 bytes. Developers of multi-threaded or multi-processor software should be aware 
that on these processors the loads will be performed in a non-atomic way. 

• If alignment checking is enabled (CRO.AM = 1, RFLAGS.AC = 1, and CPL = 3), an alignment-check exception 
(#AC) may or may not be generated (depending on processor implementation) when the memory address is 
not aligned on an 8-byte boundary. 

In 64-bit mode, use of the REX.R prefix permits this instruction to access additional registers (XMM8-XMM15). 
Note: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b otherwise instructions will #UD. 

Operation 

LDDQU (128-bit Legacy SSE version) 

DEST[127:0] ^ SRC[127:0] 

DEST[VLMAX-1:128] (Unmodified) 
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VLDDQU (VEX.1 Z8 encoded version) 

DEST[127:0] ^ SRC[127:0] 

DEST[VLMAX-1:128]^0 

VLDDQU (VEX.256 encoded version) 

DEST[255:0] ^ SRC[255:0] 

Intel C/C++ Compiler Intrinsic Equivalent 

LDDQU: _m1281 _mmjddqu_si128 (_m1281 * p); 
VLDDQU: _m256i _mm256Jddqu_si256 (_m256i * p); 

Numeric Exceptions 

None 

Other Exceptions 

See Exceptions Type 4; 

Note treatment of #AC varies. 
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LDMXCSR-Load MXCSR Register 


Opcode/ 

Instruction 

Op/ 

En 

64/32-bit 

Mode 

CPUID 

Feature 

Flag 

Description 

OF AE /2 

LDMXCSR m32 

M 

V/V 

SSE 

Load MXCSR register from m32. 

VEX.LZ.OF.WIG AE /2 

VLDMXCSR m32 

M 

v/v 

AVX 

Load MXCSR register from m32. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

M 

ModRM:r/m (r) 

NA 

NA 

NA 


Description 

Loads the source operand into the MXCSR control/status register. The source operand is a 32-bit memory location. 
See "MXCSR Control and Status Register" in Chapter 10, of the I ntel® 64 and I A-32 Architectures Software Devel¬ 
oper's Manual, Volume 1, for a description of the MXCSR register and its contents. 

The LDMXCSR instruction is typically used in conjunction with the (V)STMXCSR instruction, which stores the 
contents of the MXCSR register in memory. 

The default MXCSR value at reset is 1F80H. 

If a (V)LDMXCSR instruction clears a SIMD floating-point exception mask bit and sets the corresponding exception 
flag bit, a SIMD floating-point exception will not be immediately generated. The exception will be generated only 
upon the execution of the next instruction that meets both conditions below: 

• the instruction must operate on an XMM or VMM register operand, 

• the instruction causes that particular SIMD floating-point exception to be reported. 

This instruction's operation is the same in non-64-bit modes and 64-bit mode. 

If VLDMXCSR is encoded with VEX.L= 1, an attempt to execute the instruction encoded with VEX.L= 1 will cause an 
#UD exception. 

Note: In VEX-encoded versions, VEX.vvvv is reserved and must be 1111b, otherwise instructions will #UD. 

Operation 

MXCSR ^m32; 

C/C++ Compiler Intrinsic Equivalent 

_mm_setcsr(unsigned Int i) 

Numeric Exceptions 

None 

Other Exceptions 

See Exceptions Type 5; additionally 

#GP For an attempt to set reserved bits in MXCSR. 

#UD If VEX.vvvv iiiiB. 
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LDS/LES/LFS/LGS/LSS-Load Far Pointer 


Opcode 

Instruction 

Op/ 

En 

64-Bit 

Mode 

Compat/ 
Leg Mode 

Description 

C5 /r 

IDS r16,m16:16 

RM 

Invalid 

Valid 

Load DS:r76 with far pointer from memory. 

C5 /r 

LDS r32,m16:32 

RM 

Invalid 

Valid 

Load DS:r32 with far pointer from memory. 

OF B2 /r 

LSSr16,m16:16 

RM 

Valid 

Valid 

Load SS:r76 with far pointer from memory. 

OF B2 /r 

LSS r32,m16:32 

RM 

Valid 

Valid 

Load SS:r33 with far pointer from memory. 

REX + OF B2 /r 

LSS r64,ml6:64 

RM 

Valid 

N.E. 

Load SS:r64 with far pointer from memory. 

C4/r 

LESrl6,m16:16 

RM 

Invalid 

Valid 

Load ES:r76 with far pointer from memory. 

C4/r 

LES r32,m16:32 

RM 

Invalid 

Valid 

Load ES:r32 with far pointer from memory. 

OF B4 /r 

LFS rl6,m16:16 

RM 

Valid 

Valid 

Load FS:r76 with far pointer from memory. 

OF B4 /r 

LFS r32,m16:32 

RM 

Valid 

Valid 

Load ES:r32 with far pointer from memory. 

REX + OF B4 /r 

LFS r64,m16:64 

RM 

Valid 

N.E. 

Load FS:r64 with far pointer from memory. 

OF B5 /r 

lGSr16,m16:16 

RM 

Valid 

Valid 

Load GS:r7 6 with far pointer from memory. 

OF B5 /r 

LGS r32,ml6:32 

RM 

Valid 

Valid 

Load GS:r32 with far pointer from memory. 

REX + OF B5 /r 

LGS r64,ml6:64 

RM 

Valid 

N.E. 

Load GS:r64 with far pointer from memory. 


Instruction Operand 

Encoding 

Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RM 

ModRM:reg (w) 

ModRM:r/m (r) 

NA 

NA 


Description 

Loads a far pointer (segment selector and offset) from the second operand (source operand) into a segment 
register and the first operand (destination operand). The source operand specifies a 48-bit or a 32-bit pointer in 
memory depending on the current setting of the operand-size attribute (32 bits or 16 bits, respectively). The 
instruction opcode and the destination operand specify a segment register/general-purpose register pair. The 16- 
bit segment selector from the source operand is loaded into the segment register specified with the opcode (DS, 
SS, ES, FS, or GS). The 32-bit or 16-bit offset is loaded into the register specified with the destination operand. 

If one of these instructions is executed in protected mode, additional information from the segment descriptor 
pointed to by the segment selector in the source operand is loaded in the hidden part of the selected segment 
register. 

Also in protected mode, a NULL selector (values 0000 through 0003) can be loaded into DS, ES, FS, or GS registers 
without causing a protection exception. (Any subsequent reference to a segment whose corresponding segment 
register is loaded with a NULL selector, causes a general-protection exception (#GP) and no memory reference to 
the segment occurs.) 

In 64-bit mode, the instruction's default operation size is 32 bits. Using a REX prefix in the form of REX.W promotes 
operation to specify a source operand referencing an 80-bit pointer (16-bit selector, 64-bit offset) in memory. 
Using a REX prefix in the form of REX.R permits access to additional registers (R8-R15). See the summary chart at 
the beginning of this section for encoding data and limits. 

Operation 

64-BIT_M0DE 
IF SS is loaded 
THEN 

IF SegmentSelector = NULL and ((RPL = 3) or 
(RPL?i3and RPL?!:CPL)) 

THEN #GP(0); 

ELSE IF descriptor is in non-canonical space 
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THEN #CP(0); FI; 

ELSE IF Segment selector Index is not within descriptor table limits 
or segment selector RPL CPL 
or access rights indicate nonwritable data segment 
or DPL CPL 
THEN #GP(selector); FI; 

ELSE IF Segment marked not present 
THEN #SS(selector); FI; 

FI; 

SS <- SegmentSelector(SRC); 

SS <- SegmentDescriptor([SRC]); 

ELSE IF attempt to load DS, or ES 
THEN #UD; 

ELSE IF FS, or GS is loaded with non-NULL segment selector 

THEN IF Segment selector index is not within descriptor table limits 

or access rights indicate segment neither data nor readable code segment 
or segment is data or nonconforming-code segment 
and ( RPL > DPL or CPL > DPL) 

THEN #GP(selector); FI; 

ELSE IF Segment marked not present 
THEN #NP(selector); FI; 

FI; 

SegmentRegister <- SegmentSelector(SRC); 

SegmentRegister <- SegmentDescriptor([SRC]); 

FI; 

ELSE IF FS, or GS is loaded with a NULL selector: 

THEN 

SegmentRegister <- NULLSelector; 

SegmentRegister(Descriptor\/alidBit) 0; FI; (* Hidden flag; 
not accessible by software *) 

FI; 

DEST ^ Offset(SRC); 

PREOTECTED MODE OR COMPATIBILITY MODE; 

IF SS is loaded 
THEN 

IF SegementSelector = NULL 
THEN #GP(0); 

ELSE IF Segment selector index is not within descriptor table limits 
or segment selector RPL CPL 
or access rights indicate nonwritable data segment 
or DPL CPL 
THEN #GP(selector); FI; 

ELSE IF Segment marked not present 
THEN #SS(selector); FI; 

FI; 

SS <- SegmentSelector(SRC); 

SS <- SegmentDescriptor([SRC]); 

ELSE IF DS, ES, FS, or GS is loaded with non-NULL segment selector 
THEN IF Segment selector index is not within descriptor table limits 

or access rights indicate segment neither data nor readable code segment 
or segment is data or nonconforming-code segment 
and (RPL > DPL or CPL > DPL) 

THEN #GP(selector); FI; 
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ELSE IF Segment marked not present 
THEN #NP(selector); FI; 

FI; 

SegmentRegister SegmentSelector(SRC) AND RPL; 
SegmentRegIster SegmentDescrlptor([SRC]); 

FI; 

ELSE IF DS, ES, FS, or GS Is loaded with a NULL selector: 

THEN 

SegmentRegister NULLSelector; 
SegmentReglster(Descrlptor\/alldBlt) <- 0; FI; (* Hidden flag; 
not accessible by software *) 

FI; 

DEST ^ Offset(SRC); 

Real-Address or Vlrtual-SOSG Mode 

SegmentRegister SegmentSelector(SRC); FI; 

DEST ^ Offset(SRC); 

Flags Affected 

None 


Protected Mode Exceptions 

#UD If source operand Is not a memory location. 

If the LOCK prefix Is used. 

#GP(0) If a NULL selector Is loaded Into the SS register. 

If a memory operand effective address Is outside the CS, DS, ES, FS, or GS segment limit. 
If the DS, ES, FS, or GS register Is used to access memory and It contains a NULL segment 
selector. 


#GP(selector) 


#SS(0) 

#SS(selector) 

#NP(selector) 

#PF(fault-code) 

#AC(0) 


If the SS register Is being loaded and any of the following Is true: the segment selector Index 
Is not within the descriptor table limits, the segment selector RPL Is not equal to GPL, the 
segment Is a non-writable data segment, or DPL Is not equal to GPL. 

If the DS, ES, FS, or GS register Is being loaded with a non-NULL segment selector and any of 
the following Is true: the segment selector Index Is not within descriptor table limits, the 
segment Is neither a data nor a readable code segment, or the segment Is a data or noncon- 
formlng-code segment and both RPL and GPL are greater than DPL. 

If a memory operand effective address Is outside the SS segment limit. 

If the SS register Is being loaded and the segment Is marked not present. 

If DS, ES, FS, or GS register Is being loaded with a non-NULL segment selector and the 
segment Is marked not present. 

If a page fault occurs. 

If alignment checking Is enabled and an unaligned memory reference Is made while the 
current privilege level Is 3. 


Real-Address Mode Exceptions 

#GP If a memory operand effective address Is outside the CS, DS, ES, FS, or GS segment limit. 

#SS If a memory operand effective address Is outside the SS segment limit. 

#UD If source operand Is not a memory location. 

If the LOCK prefix Is used. 
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Virtual-SOSe Mode Exceptions 


#UD 

If source operand is not a memory location. 

If the LOCK prefix is used. 

#GP(0) 

#SS(0) 

#PF(fault-code) 

#AC(0) 

If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

If a memory operand effective address is outside the SS segment limit. 

If a page fault occurs. 

If alignment checking is enabled and an unaligned memory reference is made. 


Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

64-Bit Mode Exceptions 

#GP(0) If the memory address is in a non-canonical form 


#GP(Selector) 

If a NULL selector is attempted to be loaded into the SS register in compatibility mode. 

If a NULL selector is attempted to be loaded into the SS register in CPL3 and 64-bit mode. 

If a NULL selector is attempted to be loaded into the SS register in non-CPL3 and 64-bit mode 
where its RPL is not equal to GPL. 

If the FS, or GS register is being loaded with a non-NULL segment selector and any of the 
following is true: the segment selector index is not within descriptor table limits, the memory 
address of the descriptor is non-canonical, the segment is neither a data nor a readable code 
segment, or the segment is a data or nonconforming-code segment and both RPL and GPL are 
greater than DPL. 

If the SS register is being loaded and any of the following is true: the segment selector index 
is not within the descriptor table limits, the memory address of the descriptor is non-canonical, 
the segment selector RPL is not equal to GPL, the segment is a nonwritable data segment, or 
DPL is not equal to GPL. 

#SS(0) 

#SS(Selector) 

#NP(selector) 

If a memory operand effective address is non-canonical 

If the SS register is being loaded and the segment is marked not present. 

If FS, or GS register is being loaded with a non-NULL segment selector and the segment is 
marked not present. 

#PF(fault-code) 

#AC(0) 

If a page fault occurs. 

If alignment checking is enabled and an unaligned memory reference is made while the 
current privilege level is 3. 

#UD 

If source operand is not a memory location. 

If the LOGK prefix is used. 


3-524 Vol. 2A 


LDS/LES/LFS/LCS/LSS-Load Far Pointer 


INSTRUCTION SET REFERENCE, A-L 


LEA—Load Effective Address 


Opcode 

Instruction 

Op/ 

En 

64-Bit 

Mode 

Compat/ 
Leg Mode 

Description 

8D /r 

LEAr76,m 

RM 

Valid 

Valid 

Store effective address for m in register r16. 

8D /r 

LEA r32,m 

RM 

Valid 

Valid 

Store effective address for m in register r32. 

REX.W + 8D /r 

LEA r64,m 

RM 

Valid 

N.E. 

Store effective address for m in register r64. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RM 

ModRM:reg (w) 

ModRM:r/m (r) 

NA 

NA 


Description 

Computes the effective address of the second operand (the source operand) and stores it in the first operand 
(destination operand). The source operand is a memory address (offset part) specified with one of the processors 
addressing modes; the destination operand is a general-purpose register. The address-size and operand-size attri¬ 
butes affect the action performed by this instruction, as shown in the following table. The operand-size attribute of 
the instruction is determined by the chosen register; the address-size attribute is determined by the attribute of 
the code segment. 


Table 3-53. Non-e4-bit Mode LEA Operation with Address and Operand Size Attributes 


Operand Size 

Address Size 

Action Performed 

16 

16 

16-bit effective address is calculated and stored in requested 16-bit register destination. 

16 

32 

32-bit effective address is calculated. The lower 16 bits of the address are stored in the 
requested 16-bit register destination. 

32 

16 

16-bit effective address is calculated. The 16-bit address is zero-extended and stored in the 
requested 32-bit register destination. 

32 

32 

32-bit effective address is calculated and stored in the requested 32-bit register destination. 


Different assemblers may use different algorithms based on the size attribute and symbolic reference of the source 
operand. 

In 64-bit mode, the instruction's destination operand is governed by operand size attribute, the default operand 
size is 32 bits. Address calculation is governed by address size attribute, the default address size is 64-bits. In 64- 
bit mode, address size of 16 bits is not encodable. See Table 3-54. 


Table 3-54. 64-bit Mode LEA Operation with Address and Operand Size Attributes 


Operand Size 

Address Size 

Action Performed 

16 

32 

32-bit effective address is calculated (using 67H prefix). The lower 16 bits of the address are 
stored in the requested 16-bit register destination (using 66H prefix). 

16 

64 

64-bit effective address is calculated (default address size). The lower 16 bits of the address 
are stored in the requested 16-bit register destination (using 66H prefix). 

32 

32 

32-bit effective address is calculated (using 67H prefix) and stored in the requested 32-bit 
register destination. 

32 

64 

64-bit effective address is calculated (default address size) and the lower 32 bits of the 
address are stored in the requested 32-bit register destination. 

64 

32 

32-bit effective address is calculated (using 67H prefix), zero-extended to 64-bits, and stored 
in the requested 64-bit register destination (using REX.W). 

64 

64 

64-bit effective address is calculated (default address size) and all 64-bits of the address are 
stored in the requested 64-bit register destination (using REX.W). 
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Operation 

IF OperandSize = 16 and AddressSIze = 16 

THEN 

DEST ^ EffectlveAddress(SRC); (* 16-bit address *) 

ELSE IF OperandSize = 16 and AddressSIze = 3Z 
THEN 

temp <- EffectiveAddress(SRC); (* 32-bit address *) 

DEST ^ temp[0:15]; (* 16-bit address *) 

FI; 

ELSE IF OperandSize = 32 and AddressSIze = 16 
THEN 

temp <- EffectiveAddress(SRC); (* 16-bit address *) 

DEST <- ZeroExtend(temp); (* 32-bit address *) 

FI; 

ELSE IF OperandSize = 32 and AddressSIze = 32 
THEN 

DEST ^ EffectlveAddress(SRC); (* 32-blt address *) 

FI; 

ELSE IF OperandSize = 16 and AddressSIze = 64 
THEN 

temp <- EffectlveAddress(SRC); (* 64-blt address *) 

DEST ^ temp[0:15]; (* 16-blt address *) 

FI; 

ELSE IF OperandSize = 32 and AddressSIze = 64 
THEN 

temp <- EffectlveAddress(SRC); (* 64-blt address *) 

DEST ^ temp[0:31]; (* 16-blt address *) 

FI; 

ELSE IF OperandSize = 64 and AddressSIze = 64 
THEN 

DEST ^ EffectlveAddress(SRC); (* 64-blt address *) 

FI; 

FI; 

Flags Affected 

None 

Protected Mode Exceptions 

#UD If source operand is not a memory location. 

If the LOCK prefix is used. 

Real-Address Mode Exceptions 

Same exceptions as in protected mode. 

Virtual-SOSe Mode Exceptions 

Same exceptions as in protected mode. 

Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

e4-Bit Mode Exceptions 

Same exceptions as in protected mode. 
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LEAVE—High Level Procedure Exit 


Opcode 

Instruction 

Op/ 

En 

64-Bit 

Mode 

Compat/ 
Leg Mode 

Description 

C9 

LEAVE 

NP 

Valid 

Valid 

Set SP to BP, then pop BP. 

C9 

LEAVE 

NP 

N.E. 

Valid 

Set ESP to EBP, then pop EBP. 

C9 

LEAVE 

NP 

Valid 

N.E. 

Set RSP to RBP, then pop RBP. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

NP 

NA 

NA 

NA 

NA 


Description 

Releases the stack frame set up by an earlier ENTER instruction. The LEAVE instruction copies the frame pointer (in 
the EBP register) into the stack pointer register (ESP), which releases the stack space allocated to the stack frame. 
The old frame pointer (the frame pointer for the calling procedure that was saved by the ENTER instruction) is then 
popped from the stack into the EBP register, restoring the calling procedure's stack frame. 

A RET instruction is commonly executed following a LEAVE instruction to return program control to the calling 
procedure. 

See "Procedure Calls for Block-Structured Languages" in Chapter 7 of the I ntel® 64 and IA-32 Architectures Soft¬ 
ware Developer's Manual, Volume 1, for detailed information on the use of the ENTER and LEAVE instructions. 

In 64-bit mode, the instruction's default operation size is 64 bits; 32-bit operation cannot be encoded. See the 
summary chart at the beginning of this section for encoding data and limits. 

Operation 

IF StackAddressSize = 32 
THEN 

ESP ^ EBP; 

ELSE IF StackAddressSize = 64 
THEN RSP ^ RBP; FI; 

ELSE IF StackAddressSize = 16 
THEN SP ^ BP; FI; 

FI; 

IF OperandSIze = 32 
THEN EBP ^ Pop(); 

ELSE IF OperandSIze = 64 
THEN RBP ^ Pop(); FI; 

ELSEIF0perandSize=16 
THEN BP ^ Pop(); FI; 

FI; 

Flags Affected 

None 
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Protected Mode Exceptions 

#SS(0) If the EBP register points to a location that is not within the limits of the current stack 


#PF(fault-code) 

#AC(0) 

segment. 

If a page fault occurs. 

If alignment checking is enabled and an unaligned memory reference is made while the 
current privilege level is 3. 

#UD 

If the LOCK prefix is used. 


Real-Address Mode Exceptions 


#GP 

#UD 

If the EBP register points to a location outside of the effective address space from 0 to FFFFFI 
If the LOCK prefix is used. 


Virtual-SOSe Mode Exceptions 


#GP(0) 

#PF(fault-code) 

#AC(0) 

#UD 

If the EBP register points to a location outside of the effective address space from 0 to FFFFFI 
If a page fault occurs. 

If alignment checking is enabled and an unaligned memory reference is made. 

If the LOCK prefix is used. 


Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

e4-Bit Mode Exceptions 

#SS(0) If the stack address is in a non-canonical form 


#AC(0) 

If alignment checking is enabled and an unaligned memory reference is made while the 
current privilege level is 3. 

#UD 

If the LOCK prefix is used. 
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LFENCE—Load Fence 


Opcode 

Instruction 

Op/ 

En 

64-Bit 

Mode 

Compat/ 
Leg Mode 

Description 

OF AE E8 

LFENCE 

NP 

Valid 

Valid 

Serializes load operations. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

NP 

NA 

NA 

NA 

NA 


Description 

Performs a serializing operation on all load-from-memory instructions that were issued prior the LFENCE instruc¬ 
tion. Specifically, LFENCE does not execute until all prior instructions have completed locally, and no later instruc¬ 
tion begins execution until LFENCE completes. In particular, an instruction that loads from memory and that 
precedes an LFENCE receives data from memory prior to completion of the LFENCE. (An LFENCE that follows an 
instruction that stores to memory might complete before the data being stored have become globally visible.) 
Instructions following an LFENCE may be fetched from memory before the LFENCE, but they will not execute until 
the LFENCE completes. 

Weakly ordered memory types can be used to achieve higher processor performance through such techniques as 
out-of-order issue and speculative reads. The degree to which a consumer of data recognizes or knows that the 
data is weakly ordered varies among applications and may be unknown to the producer of this data. The LFENCE 
instruction provides a performance-efficient way of ensuring load ordering between routines that produce weakly- 
ordered results and routines that consume that data. 

Processors are free to fetch and cache data speculatively from regions of system memory that use the WB, WC, 
and WT memory types. This speculative fetching can occur at any time and is not tied to instruction execution. 
Thus, it is not ordered with respect to executions of the LFENCE instruction; data can be brought into the caches 
speculatively just before, during, or after the execution of an LFENCE instruction. 

This instruction's operation is the same in non-64-bit modes and 64-bit mode. 

Specification of the instruction's opcode above indicates a ModR/M byte of E8. For this instruction, the processor 
ignores the r/m field of the ModR/M byte. Thus, LFENCE is encoded by any opcode of the form OF AE Ex, where x is 
in the range 8-F. 

Operation 

Wait_On_Following_lnstructions_Until(preceding_instructions_complete); 

Intel C/C++ Compiler Intrinsic Equivalent 

void _mm_lfence(void) 

Exceptions (All Modes of Operation) 

#UD If CPUID.01H:EDX.SSE2[bit 26] = 0. 

If the LOCK prefix is used. 


LFENCE-Load Fence 
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LGDT/LIDT—Load Global/Interrupt Descriptor Table Register 


Opcode 

Instruction 

Op/ 

En 

64-Bit 

Mode 

Compat/ 
Leg Mode 

Description 

OF 01 /2 

IGDJml6&32 

M 

N.E. 

Valid 

Load m into GDTR. 

OF 01 /3 

l\DJ ml6&32 

M 

N.E. 

Valid 

Load m into IDTR. 

OF 01 /2 

IGDJml6&64 

M 

Valid 

N.E. 

Load m into GDTR. 

OF 01 /3 

l\DJ ml6&64 

M 

Valid 

N.E. 

Load m into IDTR. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

M 

ModRM:r/m (r) 

NA 

NA 

NA 


Description 

Loads the values in the source operand into the global descriptor table register (GDTR) or the interrupt descriptor 
table register (IDTR). The source operand specifies a 6-byte memory location that contains the base address (a 
linear address) and the limit (size of table in bytes) of the global descriptor table (GDT) or the interrupt descriptor 
table (IDT). If operand-size attribute is 32 bits, a 16-bit limit (lower 2 bytes of the 6-byte data operand) and a 32- 
bit base address (upper 4 bytes of the data operand) are loaded into the register. If the operand-size attribute 
is 16 bits, a 16-bit limit (lower 2 bytes) and a 24-bit base address (third, fourth, and fifth byte) are loaded. Here, 
the high-order byte of the operand is not used and the high-order byte of the base address in the GDTR or IDTR is 
filled with zeros. 

The LGDT and LIDT instructions are used only in operating-system software; they are not used in application 
programs. They are the only instructions that directly load a linear address (that is, not a segment-relative 
address) and a limit in protected mode. They are commonly executed in real-address mode to allow processor 
initialization prior to switching to protected mode. 

In 64-bit mode, the instruction's operand size is fixed at 8-1-2 bytes (an 8-byte base and a 2-byte limit). See the 
summary chart at the beginning of this section for encoding data and limits. 

See "SGDT—Store Global Descriptor Table Register" in Chapter 4, Intel® 64 and IA-32 Architectures Software 
Developer's Manual, Volume 2B, for information on storing the contents of the GDTR and IDTR. 
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Operation 

IF Instruction Is LIDT 
THEN 

IF OperandSize= 16 
THEN 

IDTR(Llmlt)^SRC[0:15]; 

IDTR(Base) ^ SRC[16:47] AND OOFFFFFFH; 

ELSE IF 32-blt Operand Size 
THEN 

IDTR(Llmlt)^SRC[0:15]; 

IDTR(Base) ^ SRC[16:47]; 

FI; 

ELSE IF 64-blt Operand Size (* In 64-Blt Mode *) 

THEN 

IDTR(Llmlt)^SRC[0:15]; 

IDTR(Base) ^ SRC[16:79]; 

FI; 

FI; 

ELSE (* Instruction is LCDT *) 

IF OperandSize= 16 
THEN 

GDTR(Llmlt)^SRC[0:15]; 

GDTR(Base) ^ SRC[16:47] AND OOFFFFFFH; 

ELSE IF 32-blt Operand Size 
THEN 

GDTR(Llmlt)^SRC[0:15]; 

GDTR(Base) ^ SRC[16:47]; 

FI; 

ELSE IF 64-blt Operand Size (* In 64-Blt Mode *) 

THEN 

GDTR(Llmlt)^SRC[0:15]; 

GDTR(Base) ^ SRC[16:79]; 

FI; 

FI; 

FI; 

Flags Affected 

None 

Protected Mode Exceptions 

#UD If source operand is not a memory location. 

If the LOCK prefix is used. 

#GP(0) If the current privilege level is not 0. 

If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

If the DS, ES, FS, or GS register is used to access memory and it contains a NULL segment 
selector. 

#SS(0) If a memory operand effective address is outside the SS segment limit. 

#PF(fault-code) If a page fault occurs. 


LGDT/LIDT—Load Clobal/Interrupt Descriptor Table Register 
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Real-Address Mode Exceptions 

#UD If source operand is not a memory location. 

If the LOCK prefix is used. 

#GP If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

#SS If a memory operand effective address is outside the SS segment limit. 

Virtual-SOSe Mode Exceptions 

#UD If source operand is not a memory location. 

If the LOCK prefix is used. 

#GP(0) The LGDT and LIDT instructions are not recognized in virtual-8086 mode. 

#GP If the current privilege level is not 0. 

Compatibility Mode Exceptions 

Same exceptions as in protected mode. 


e4-Bit Mode Exceptions 


#SS(0) 

#GP(0) 

#UD 

#PF(fault-code) 


If a memory address referencing the SS segment is in a non-canonical form. 
If the current privilege level is not 0. 

If the memory address is in a non-canonical form. 

If source operand is not a memory location. 

If the LOCK prefix is used. 

If a page fault occurs. 
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LLDT—Load Local Descriptor Table Register 


Opcode 

Instruction 

Op/ 

En 

64-Bit 

Mode 

Compat/ 
Leg Mode 

Description 

OF 00 /2 

LLDT r/m16 

M 

Valid 

Valid 

Load segment selector r/m16 into LDTR. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

M 

ModRM:r/m (r) 

NA 

NA 

NA 


Description 

Loads the source operand into the segment selector field of the local descriptor table register (LDTR). The source 
operand (a general-purpose register or a memory location) contains a segment selector that points to a local 
descriptor table (LDT). After the segment selector is loaded in the LDTR, the processor uses the segment selector 
to locate the segment descriptor for the LDT in the global descriptor table (GDT). It then loads the segment limit 
and base address for the LDT from the segment descriptor into the LDTR. The segment registers DS, ES, SS, FS, 
GS, and CS are not affected by this instruction, nor is the LDTR field in the task state segment (TSS) for the current 
task. 

If bits 2-15 of the source operand are 0, LDTR is marked invalid and the LLDT instruction completes silently. 
Flowever, all subsequent references to descriptors in the LDT (except by the LAR, VERR, VERW or LSL instructions) 
cause a general protection exception (#GP). 

The operand-size attribute has no effect on this instruction. 

The LLDT instruction is provided for use in operating-system software; it should not be used in application 
programs. This instruction can only be executed in protected mode or 64-bit mode. 

In 64-bit mode, the operand size is fixed at 16 bits. 

Operation 

IF SRC(Offset) > descriptor table limit 
THEN #GP(segment selector); FI; 

IF segment selector Is valid 

Read segment descriptor; 

IF SegmentDescrlptor(Type) ^ LDT 
THEN #GP(segment selector); FI; 

IF segment descriptor Is not present 
THEN #NP(segment selector); FI; 

LDTR(SegmentSelector) <- SRC; 

LDTR(SegmentDescriptor) ^ GDTSegmentDescrlptor; 

ELSE LDTR ^ INVALID 
FI; 

Flags Affected 

None 


LLDT—Load Local Descriptor Table Register 
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Protected Mode Exceptions 

#GP(0) If the current privilege level is not 0 


#GP(selector) 

If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

If the DS, ES, FS, or GS register contains a NULL segment selector. 

If the selector operand does not point into the Global Descriptor Table or if the entry in the GDT 
is not a Local Descriptor Table. 

Segment selector is beyond GDT limit. 

#SS(0) 

#NP(selector) 

#PF(fault-code) 

#UD 

If a memory operand effective address is outside the SS segment limit. 

If the LDT descriptor is not present. 

If a page fault occurs. 

If the LOCK prefix is used. 


Real-Address Mode Exceptions 

#UD The LLDT instruction is not recognized in real-address mode. 

Virtual-SOSe Mode Exceptions 

#UD The LLDT instruction is not recognized in virtual-8086 mode. 

Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

64-Bit Mode Exceptions 

#SS(0) If a memory address referencing the SS segment is in a non-canonical form 


#GP(0) 

If the current privilege level is not 0. 

If the memory address is in a non-canonical form. 

#GP(selector) 

If the selector operand does not point into the Global Descriptor Table or if the entry in the GDT 
is not a Local Descriptor Table. 

Segment selector is beyond GDT limit. 

#NP(selector) 

#PF(fault-code) 

#UD 

If the LDT descriptor is not present. 

If a page fault occurs. 

If the LOCK prefix is used. 
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LMSW—Load Machine Status Word 


Opcode 

Instruction 

Op/ 

En 

64-Bit 

Mode 

Compat/ 
Leg Mode 

Description 

OF 01 /6 

LMSW r/m16 

M 

Valid 

Valid 

Loads r/m16 in machine status word of CRO. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

M 

ModRM:r/m (r) 

NA 

NA 

NA 


Description 

Loads the source operand into the machine status word, bits 0 through 15 of register CRO. The source operand can 
be a 16-bit general-purpose register or a memory location. Only the low-order 4 bits of the source operand (which 
contains the PE, MP, EM, and TS flags) are loaded into CRO. The PG, CD, NW, AM, WP, NE, and ET flags of CRO are 
not affected. The operand-size attribute has no effect on this instruction. 

If the PE flag of the source operand (bit 0) is set to 1, the instruction causes the processor to switch to protected 
mode. While in protected mode, the LMSW instruction cannot be used to clear the PE flag and force a switch back 
to real-address mode. 

The LMSW instruction is provided for use in operating-system software; it should not be used in application 
programs. In protected or virtual-8086 mode, it can only be executed at CPL 0. 

This instruction is provided for compatibility with the Intel 286 processor; programs and procedures intended to 
run on IA-32 and Intel 64 processors beginning with Intel386 processors should use the MOV (control registers) 
instruction to load the whole CRO register. The MOV CRO instruction can be used to set and clear the PE flag in CRO, 
allowing a procedure or program to switch between protected and real-address modes. 

This instruction is a serializing instruction. 

This instruction's operation is the same in non-64-bit modes and 64-bit mode. Note that the operand size is fixed 
at 16 bits. 

See "Changes to Instruction Behavior in VMX Non-Root Operation" in Chapter 25 of the I ntel® 64 and IA-32 Archi¬ 
tectures Software Developer's Manual, Volume 3C, for more information about the behavior of this instruction in 
VMX non-root operation. 

Operation 

CR0[0:3] ^ SRC[0:3]; 

Flags Affected 

None 

Protected Mode Exceptions 

#GP(0) If the current privilege level is not 0. 

If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

If the DS, ES, FS, or GS register is used to access memory and it contains a NULL segment 
selector. 

#SS(0) If a memory operand effective address is outside the SS segment limit. 

#PF(fault-code) If a page fault occurs. 

#UD If the LOCK prefix is used. 

Real-Address Mode Exceptions 

#GP If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

#UD If the LOCK prefix is used. 


LMSW—Load Machine Status Word 
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Virtual-SOSe Mode Exceptions 

#GP(0) The LMSW instruction is not recognized in real-address mode. 

Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

e4-Bit Mode Exceptions 

#SS(0) If a memory address referencing the SS segment is in a non-canonical form 


#GP(0) 

If the current privilege level is not 0. 

If the memory address is in a non-canonical form. 

#PF(fault-code) 

#UD 

If a page fault occurs. 

If the LOCK prefix is used. 
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LOCK—Assert LOCK# Signal Prefix 


Opcode 

Instruction 

Op/ 

En 

64-Bit 

Mode 

Compat/ 
Leg Mode 

Description 

FO 

LOCK 

NP 

Valid 

Valid 

Asserts LOCK# signal for duration of the 
accompanying instruction. 


NOTES: 

* See IA-32 Architecture Compatibility section below. 


Instruction Operand 

Encoding 

Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

NP 

NA 

NA 

NA 

NA 


Description 

Causes the processor's LOCK# signal to be asserted during execution of the accompanying instruction (turns the 
instruction into an atomic instruction). In a multiprocessor environment, the LOCK# signal ensures that the 
processor has exclusive use of any shared memory while the signal is asserted. 

In most IA-32 and all Intel 64 processors, locking may occur without the LOCK# signal being asserted. See the "IA- 
32 Architecture Compatibility" section below for more details. 

The LOCK prefix can be prepended only to the following instructions and only to those forms of the instructions 
where the destination operand is a memory operand: ADD, ADC, AND, BTC, BTR, BTS, CMPXCHG, CMPXCH8B, 
CMPXCHG16B, DEC, INC, NEG, NOT, OR, SBB, SUB, XOR, XADD, and XCHG. If the LOCK prefix is used with one of 
these instructions and the source operand is a memory operand, an undefined opcode exception (#UD) may be 
generated. An undefined opcode exception will also be generated if the LOCK prefix is used with any instruction not 
in the above list. The XCHG instruction always asserts the LOCK# signal regardless of the presence or absence of 
the LOCK prefix. 

The LOCK prefix is typically used with the BTS instruction to perform a read-modify-write operation on a memory 
location in shared memory environment. 

The integrity of the LOCK prefix is not affected by the alignment of the memory field. Memory locking is observed 
for arbitrarily misaligned fields. 

This instruction's operation is the same in non-64-bit modes and 64-bit mode. 

IA-32 Architecture Compatibility 

Beginning with the P6 family processors, when the LOCK prefix is prefixed to an instruction and the memory area 
being accessed is cached internally in the processor, the LOCK# signal is generally not asserted. Instead, only the 
processor's cache is locked. Here, the processor's cache coherency mechanism ensures that the operation is 
carried out atomically with regards to memory. See "Effects of a Locked Operation on Internal Processor Caches" 
in Chapter 8 of Intel® 64 and IA-32 Architectures Software Developer's Manual, Volume 3A, the for more informa¬ 
tion on locking of caches. 

Operation 

AssertLOCK#(DurationOfAccompaninglnstruction); 

Flags Affected 

None 

Protected Mode Exceptions 

#UD If the LOCK prefix is used with an instruction not listed: ADD, ADC, AND, BTC, BTR, BTS, 

CMPXCHG, CMPXCH8B, CMPXCHG16B, DEC, INC, NEG, NOT, OR, SBB, SUB, XOR, XADD, 
XCHG. 

Other exceptions can be generated by the instruction when the LOCK prefix is applied. 


LOCK-Assert LOCK# Signal Prefix 
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Real-Address Mode Exceptions 

Same exceptions as in protected mode. 

Virtual-SOSe Mode Exceptions 

Same exceptions as in protected mode. 

Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

e4-Bit Mode Exceptions 

Same exceptions as in protected mode. 
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LODS/LODSB/LODSW/LODSD/LODSQ-Load String 


Opcode 

Instruction 

Op/ 

En 

64-Bit 

Mode 

Compat/ 
Leg Mode 

Description 

AC 

LODS mS 

NP 

Valid 

Valid 

For legacy mode. Load byte at address DS:(E)SI 
into AL. For 64-bit mode load byte at address 
(R)SI into AL. 

AD 

LODSm 76 

NP 

Valid 

Valid 

For legacy mode, Load word at address 

DS:(E)SI into AX. For 64-bit mode load word at 
address (R)SI into AX. 

AD 

LODSmBZ 

NP 

Valid 

Valid 

For legacy mode. Load dword at address 
DS:(E)SI into EAX. For 64-bit mode load dword 
at address (R)SI into EAX. 

REX.W + AD 

LODS m64 

NP 

Valid 

N.E. 

Load gword at address (R)SI into RAX. 

AC 

LODSB 

NP 

Valid 

Valid 

For legacy mode. Load byte at address DS:(E)SI 
into AL. For 64-bit mode load byte at address 
(R)SI into AL. 

AD 

LODSW 

NP 

Valid 

Valid 

For legacy mode. Load word at address 

DS:(E)SI into AX. For 64-bit mode load word at 
address (R)SI into AX. 

AD 

LODSD 

NP 

Valid 

Valid 

For legacy mode. Load dword at address 
DS:(E)SI into EAX. For 64-bit mode load dword 
at address (R)SI into EAX. 

REX.W + AD 

LODSQ 

NP 

Valid 

N.E. 

Load gword at address (R)SI into RAX. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

NP 

NA 

NA 

NA 

NA 


Description 

Loads a byte, word, or doubleword from the source operand into the AL, AX, or EAX register, respectively. The 
source operand is a memory location, the address of which is read from the DS:ESI or the DS:SI registers 
(depending on the address-size attribute of the instruction, 32 or 16, respectively). The DS segment may be over¬ 
ridden with a segment override prefix. 

At the assembly-code level, two forms of this instruction are allowed: the "explicit-operands" form and the "no¬ 
operands" form. The explicit-operands form (specified with the LODS mnemonic) allows the source operand to be 
specified explicitly. Here, the source operand should be a symbol that indicates the size and location of the source 
value. The destination operand is then automatically selected to match the size of the source operand (the AL 
register for byte operands, AX for word operands, and EAX for doubleword operands). This explicit-operands form 
is provided to allow documentation; however, note that the documentation provided by this form can be 
misleading. That is, the source operand symbol must specify the correct type (size) of the operand (byte, word, or 
doubleword), but it does not have to specify the correct location. The location is always specified by the DS:(E)SI 
registers, which must be loaded correctly before the load string instruction is executed. 

The no-operands form provides "short forms" of the byte, word, and doubleword versions of the LODS instructions. 
Here also DS:(E)SI is assumed to be the source operand and the AL, AX, or EAX register is assumed to be the desti¬ 
nation operand. The size of the source and destination operands is selected with the mnemonic: LODSB (byte 
loaded into register AL), LODSW (word loaded into AX), or LODSD (doubleword loaded into EAX). 

After the byte, word, or doubleword is transferred from the memory location into the AL, AX, or EAX register, the 
(E)SI register is incremented or decremented automatically according to the setting of the DF flag in the EFLAGS 
register. (If the DF flag is 0, the (E)SI register is incremented; if the DF flag is 1, the ESI register is decremented.) 
The (E)SI register is incremented or decremented by 1 for byte operations, by 2 for word operations, or by 4 for 
doubleword operations. 


LODS/LODSB/LODSW/LODSD/LODSQ-Load String 
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In 64-bit mode, use of the REX.W prefix promotes operation to 64 bits. LODS/LODSQ load the quadword at address 
(R)SI into RAX. The (R)SI register is then incremented or decremented automatically according to the setting of 
the DF flag in the EFLAGS register. 

The LODS, LODSB, LODSW, and LODSD instructions can be preceded by the REP prefix for block loads of ECX bytes, 
words, or doublewords. More often, however, these instructions are used within a LOOP construct because further 
processing of the data moved into the register is usually necessary before the next transfer can be made. See 
"REP/REPE/REPZ /REPNE/REPNZ—Repeat String Operation Prefix" in Chapter 4 of the I ntel® 64 and IA-32 Archi¬ 
tectures Software Developer's Manual, Volume 2B, for a description of the REP prefix. 

Operation 

IF AL ^ SRC; (* Byte load *) 

THEN AL ^ SRC; (* Byte load *) 

IFDF = 0 

THEN (E)SI^(E)SI + 1; 

ELSE (E)SI ^ (E)SI - 1; 

FI; 

ELSE IF AX ^ SRC; (* Word load *) 

THENIFDF = 0 

THEN (E)SI ^ (E)SI + 2; 

ELSE (E)SI ^ (E)SI - 2; 

IF; 

FI; 

ELSE IF EAX ^ SRC; (* Doubleword load *) 

THEN IF DF = 0 

THEN (E)SI ^ (E)SI + 4; 

ELSE (E)SI ^ (E)SI - 4; 

FI; 

FI; 

ELSE IF RAX ^ SRC; (* Quadword load *) 

THENIFDF = 0 

THEN (R)SI ^ (R)SI + 8; 

ELSE (R)SI ^ (R)SI - 8; 

FI; 

FI; 

FI; 

Flags Affected 

None 

Protected Mode Exceptions 

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

If the DS, ES, FS, or GS register contains a NULL segment selector. 

#SS(0) If a memory operand effective address is outside the SS segment limit. 

#PF(fault-code) If a page fault occurs. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the 

current privilege level is 3. 

#UD If the LOCK prefix is used. 

Real-Address Mode Exceptions 

#GP If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

#SS If a memory operand effective address is outside the SS segment limit. 

#UD If the LOCK prefix is used. 
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\/irtual-8086 Mode Exceptions 

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

#SS(0) If a memory operand effective address is outside the SS segment limit. 

#PF(fault-code) If a page fault occurs. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made. 

#UD If the LOCK prefix is used. 

Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

64-Bit Mode Exceptions 

#SS(0) If a memory address referencing the SS segment is in a non-canonical form. 

#GP(0) If the memory address is in a non-canonical form. 

#PF(fault-code) If a page fault occurs. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the 

current privilege level is 3. 

#UD If the LOCK prefix is used. 


LODS/LODSB/LODSW/LODSD/LODSQ-Load String 


Vol.2A 3-541 


INSTRUCTION SET REFERENCE, A-L 


LOOP/LOOPcc—Loop According to ECX Counter 


Opcode 

Instruction 

Op/ 

Gn 

64-Bit 

Mode 

Compat/ 
Leg Mode 

Description 

E2 cb 

LOOP rel8 

D 

Valid 

Valid 

Decrement count; jump short if count ^ 0. 

El cb 

LOOPE re/S 

D 

Valid 

Valid 

Decrement count; Jump short if count ^ 0 and 
ZF= 1. 

EO cb 

LOOPNE rel8 

D 

Valid 

Valid 

Decrement count; Jump short if count ^ 0 and 
ZF = 0. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

D 

Offset 

NA 

NA 

NA 


Description 

Performs a loop operation using the RCX, ECX or CX register as a counter (depending on whether address size is 64 
bits, 32 bits, or 16 bits). Note that the LOOP instruction ignores REX.W; but 64-bit address size can be over-ridden 
using a 67H prefix. 

Each time the LOOP instruction is executed, the count register is decremented, then checked for 0. If the count is 
0, the loop is terminated and program execution continues with the instruction following the LOOP instruction. If 
the count is not zero, a near jump is performed to the destination (target) operand, which is presumably the 
instruction at the beginning of the loop. 

The target instruction is specified with a relative offset (a signed offset relative to the current value of the instruc¬ 
tion pointer in the IP/EIP/RIP register). This offset is generally specified as a label in assembly code, but at the 
machine code level, it is encoded as a signed, 8-bit immediate value, which is added to the instruction pointer. 
Offsets of -128 to +127 are allowed with this instruction. 

Some forms of the loop instruction (LOOPcc) also accept the ZF flag as a condition for terminating the loop before 
the count reaches zero. With these forms of the instruction, a condition code (cc) is associated with each instruction 
to indicate the condition being tested for. Flere, the LOOPcc instruction itself does not affect the state of the ZF flag; 
the ZF flag is changed by other instructions in the loop. 

Operation 

IF (AddressSIze = 32) 

THEN Count is ECX; 

ELSE IF (AddressSize = 64) 

Count is RCX; 

ELSE Count is CX; 

FI; 

Count Count - 1; 

IF Instruction is not LOOP 
THEN 

IF (Instruction <- LOOPE) or (Instruction <- LOOPZ) 

THENIF(ZF=1)and (Count ?!:0) 

THEN BranchCond ^ 1; 

ELSE BranchCond <- 0; 

FI; 

ELSE (Instruction = LOOPNE) or (Instruction = LOOPNZ) 

IF (ZF = 0)and (Count ^0) 

THEN BranchCond ^ 1; 

ELSE BranchCond <- 0; 
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FI; 

FI; 

ELSE (* Instruction = LOOP *) 

IF (Count 0) 

THEN BranchCond^ 1; 

ELSE BranchCond 0; 

FI; 

FI; 

IF BranchCond = 1 
THEN 

IF OperandSize = 32 

THEN EIP ^ EIP + SignExtend(DEST); 

ELSE IF OperandSize = 64 

THEN RIP ^ RIP + SignExtend(DEST); 

FI; 

ELSEIF0perandSize=16 

THEN EIP ^ EIP AND OOOOFFFFH; 

FI; 

FL¬ 
IP OperandSize = (32 or 64) 

THEN IF (R/E)IP < CS.Base or (R/E)IP > CS.LImIt 
#GP; FI; 

FI; 

FI; 

ELSE 

Terminate loop and continue program execution at (R/E)IP; 

FI; 

Flags Affected 

None 

Protected Mode Exceptions 

#GP(0) If the offset being jumped to Is beyond the limits of the CS segment. 

#UD If the LOCK prefix Is used. 

Real-Address Mode Exceptions 

#GP If the offset being jumped to Is beyond the limits of the CS segment or Is outside of the effec¬ 

tive address space from 0 to FFFFH. This condition can occur If a 32-blt address size override 
prefix Is used. 

#UD If the LOCK prefix Is used. 

Virtual-SOSe Mode Exceptions 

Same exceptions as In real address mode. 

Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

64-Bit Mode Exceptions 

#GP(0) If the offset being jumped to is in a non-canonical form. 

#UD If the LOCK prefix is used. 


LOOP/LOOPcc—Loop According to ECX Counter 
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LSL—Load Segment Limit 


Opcode 

Instruction 

Op/ 

En 

64-Bit 

Mode 

Compat/ 
Leg Mode 

Description 

OF 03 /r 

LSL r16, rl6/m 16 

RM 

Valid 

Valid 

Load: r76 <- segment limit, selector r16/m16. 

OF 03 /r 

LSL r32, r32/m16 

RM 

Valid 

Valid 

Load: r32 1 - segment limit, selector r32/m16. 

REX.W + OF 03 /r 

LSL r64, r32/m16 

RM 

Valid 

Valid 

Load: r64 1 - segment limit, selector r32/m16 


NOTES; 


* For all loads (regardless of destination sizing), only bits 16-0 are used. Other bits are ignored. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RM 

ModRM:reg (w) 

ModRM:r/m (r) 

NA 

NA 


Description 

Loads the unscrambled segment limit from the segment descriptor specified with the second operand (source 
operand) into the first operand (destination operand) and sets the ZF flag in the EFLAGS register. The source 
operand (which can be a register or a memory location) contains the segment selector for the segment descriptor 
being accessed. The destination operand is a general-purpose register. 

The processor performs access checks as part of the loading process. Once loaded in the destination register, soft¬ 
ware can compare the segment limit with the offset of a pointer. 

The segment limit is a 20-bit value contained in bytes 0 and 1 and in the first 4 bits of byte 6 of the segment 
descriptor. If the descriptor has a byte granular segment limit (the granularity flag is set to 0), the destination 
operand is loaded with a byte granular value (byte limit). If the descriptor has a page granular segment limit (the 
granularity flag is set to 1), the LSL instruction will translate the page granular limit (page limit) into a byte limit 
before loading it into the destination operand. The translation is performed by shifting the 20-bit "raw" limit left 12 
bits and filling the low-order 12 bits with Is. 

When the operand size is 32 bits, the 32-bit byte limit is stored in the destination operand. When the operand size 
is 16 bits, a valid 32-bit limit is computed; however, the upper 16 bits are truncated and only the low-order 16 bits 
are loaded into the destination operand. 

This instruction performs the following checks before it loads the segment limit into the destination register: 

• Checks that the segment selector is not NULL. 

• Checks that the segment selector points to a descriptor that is within the limits of the GDT or LDT being 
accessed 

• Checks that the descriptor type is valid for this instruction. All code and data segment descriptors are valid for 
(can be accessed with) the LSL instruction. The valid special segment and gate descriptor types are given in the 
following table. 

• If the segment is not a conforming code segment, the instruction checks that the specified segment descriptor 
is visible at the CPL (that is, if the CPL and the RPL of the segment selector are less than or equal to the DPL of 
the segment selector). 

If the segment descriptor cannot be accessed or is an invalid type for the instruction, the ZF flag is cleared and no 
value is loaded in the destination operand. 
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Table 3-55. Segment and Gate Descriptor Types 


Type 

Protected Mode 

IA-32e Mode 


Name 

Valid 

Name 

Valid 

0 

Reserved 

No 

Upper 8 byte of a 16-Byte 
descriptor 

Yes 

1 

Available 16-bit TSS 

Yes 

Reserved 

No 

2 

LDT 

Yes 

LDT 

Yes 

3 

Busy 16-bit TSS 

Yes 

Reserved 

No 

4 

16-bit call gate 

No 

Reserved 

No 

5 

16-bit/32-bit task gate 

No 

Reserved 

No 

6 

16-bit interrupt gate 

No 

Reserved 

No 

7 

16-bit trap gate 

No 

Reserved 

No 

8 

Reserved 

No 

Reserved 

No 

9 

Available 32-bit TSS 

Yes 

64-bit TSS 

Yes 

A 

Reserved 

No 

Reserved 

No 

B 

Busy 32-bit TSS 

Yes 

Busy 64-bit TSS 

Yes 

C 

32-bit call gate 

No 

64-bit call gate 

No 

D 

Reserved 

No 

Reserved 

No 

E 

32-bit interrupt gate 

No 

64-bit interrupt gate 

No 

F 

32-bit trap gate 

No 

64-bit trap gate 

No 


Operation 

IF SRC(Offset) > descriptor table limit 
THEN ZF ^ 0; FI; 


Read segment descriptor; 

IF SegmentDescriptor(Type) ^ conforming code segment 
and (CPL > DPI) OR (RPL > DPI) 
or Segment type is not valid for instruction 
THEN 

ZF ^ 0; 

ELSE 

temp SegmentLimit([SRC]); 

IF(C^I) 

THEN temp ^ ShiftLeft(1 Z, temp) OR OOOOOFFFH; 

ELSEIF0perandSize = 32 
THEN DEST ^ temp; FI; 

ELSE IF OperandSize = 64 (* REX.W used *) 

THEN DEST (* Zero-extended *) temp; FI; 

ELSE (* OperandSize =16*) 

DEST ^ temp AND FFFFH; 

FI; 

FI; 

Flags Affected 

The ZF flag is set to 1 if the segment limit is loaded successfully; otherwise, it is set to 0. 
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Protected Mode Exceptions 

#GP(0) If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 


#SS(0) 

#PF(fault-code) 

#AC(0) 

If the DS, ES, FS, or GS register is used to access memory and it contains a NULL segment 
selector. 

If a memory operand effective address is outside the SS segment limit. 

If a page fault occurs. 

If alignment checking is enabled and the memory operand effective address is unaligned while 
the current privilege level is 3. 

#UD 

If the LOCK prefix is used. 


Real-Address Mode Exceptions 

#UD The LSL instruction cannot be executed in real-address mode. 

Virtual-SOSe Mode Exceptions 

#UD The LSL instruction cannot be executed in virtual-8086 mode. 

Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

e4-Bit Mode Exceptions 

#SS(0) If the memory operand effective address referencing the SS segment is in a non-canonical 


#GP(0) 

#PF(fault-code) 

#AC(0) 

form. 

If the memory operand effective address is in a non-canonical form. 

If a page fault occurs. 

If alignment checking is enabled and the memory operand effective address is unaligned while 
the current privilege level is 3. 

#UD 

If the LOCK prefix is used. 
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LTR—Load Task Register 


Opcode 

Instruction 

Op/ 

En 

64-Bit 

Mode 

Compat/ 
Leg Mode 

Description 

OF 00 /3 

LTR r/m16 

M 

Valid 

Valid 

Load r/m16 into task register. 


Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

M 

ModRM:r/m (r) 

NA 

NA 

NA 


Description 

Loads the source operand into the segment selector field of the task register. The source operand (a general- 
purpose register or a memory location) contains a segment selector that points to a task state segment (TSS). 
After the segment selector is loaded in the task register, the processor uses the segment selector to locate the 
segment descriptor for the TSS in the global descriptor table (GDT). It then loads the segment limit and base 
address for the TSS from the segment descriptor into the task register. The task pointed to by the task register is 
marked busy, but a switch to the task does not occur. 

The LTR instruction is provided for use in operating-system software; it should not be used in application programs. 
It can only be executed in protected mode when the CPL is 0. It is commonly used in initialization code to establish 
the first task to be executed. 

The operand-size attribute has no effect on this instruction. 

In 64-bit mode, the operand size is still fixed at 16 bits. The instruction references a 16-byte descriptor to load the 
64-bit base. 

Operation 

IF SRC Is a NULL selector 
THEN #GP(0); 

IF SRC(Offset) > descriptor table limit OR IF SRC(type) global 
THEN #GP(segment selector); FI; 

Read segment descriptor; 

IF segment descriptor is not for an available TSS 
THEN #GP(segment selector); FI; 

IF segment descriptor is not present 
THEN #NP(segment selector); FI; 

TSSsegmentDescriptor(busy) <- 1; 

(* Locked read-modify-write operation on the entire descriptor when setting busy flag *) 

TaskRegister(SegmentSelector) ^ SRC; 

TaskRegister(SegmentDescriptor) TSSSegmentDescriptor; 

Flags Affected 

None 
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Protected Mode Exceptions 

#GP(0) If the current privilege level is not 0 


#GP(selector) 

If a memory operand effective address is outside the CS, DS, ES, FS, or GS segment limit. 

If the source operand contains a NULL segment selector. 

If the DS, ES, FS, or GS register is used to access memory and it contains a NULL segment 
selector. 

If the source selector points to a segment that is not a TSS or to one for a task that is already 
busy. 

If the selector points to LDT or is beyond the GDT limit. 

#NP(selector) 

#SS(0) 

#PF(fault-code) 

#UD 

If the TSS is marked not present. 

If a memory operand effective address is outside the SS segment limit. 

If a page fault occurs. 

If the LOCK prefix is used. 


Real-Address Mode Exceptions 

#UD The LTR instruction is not recognized in real-address mode. 

Virtual-SOSe Mode Exceptions 

#UD The LTR instruction is not recognized in virtual-8086 mode. 

Compatibility Mode Exceptions 

Same exceptions as in protected mode. 

e4-Bit Mode Exceptions 

#SS(0) If a memory address referencing the SS segment is in a non-canonical form 


#GP(0) 

If the current privilege level is not 0. 

If the memory address is in a non-canonical form. 

If the source operand contains a NULL segment selector. 

#GP(selector) 

If the source selector points to a segment that is not a TSS or to one for a task that is already 
busy. 

If the selector points to LDT or is beyond the GDT limit. 

If the descriptor type of the upper 8-byte of the 16-byte descriptor is non-zero. 

#NP(selector) 

#PF(fault-code) 

#UD 

If the TSS is marked not present. 

If a page fault occurs. 

If the LOCK prefix is used. 
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LZCNT— Count the Number of Leading Zero Bits 


Opcode/Instruction 

Op/ 

En 

64/32 

-bit 

Mode 

CPUID 

Feature 

Flag 

Description 

F3 OF BD /r 

RM 

V/V 

LZCNT 

Count the number of leading zero bits in r/ml 6, return result in r16. 

LZCNTr16,r/m16 




F3 OF BD /r 

RM 

v/v 

LZCNT 

Count the number of leading zero bits in r/m32, return result in r32. 

LZCNT r32, r/m32 




F3 REX.W OF BD /r 

RM 

V/N.E. 

LZCNT 

Count the number of leading zero bits in r/m64, return result in r64. 

LZCNT r64, r/m64 





Instruction Operand Encoding 


Op/En 

Operand 1 

Operand 2 

Operand 3 

Operand 4 

RM 

ModRM:reg (w) 

ModRM:r/m (r) 

NA 

NA 


Description 

Counts the number of leading most significant zero bits in a source operand (second operand) returning the result 
into a destination (first operand). 

LZCNT differs from BSR. For example, LZCNT will produce the operand size when the input operand is zero. It 
should be noted that on processors that do not support LZCNT, the instruction byte encoding is executed as BSR. 

In 64-bit mode 64-bit operand size requires REX.W=1. 

Operation 

temp <- OperandSize -1 
BEST ^ 0 

WHILE (temp >= 0) AND (Bit(SRC, temp) = 0) 

DO 

temp temp -1 
BEST ^ DEST+ 1 
OD 

IF BEST = OperandSize 
CF^ 1 
ELSE 
CF^O 
FI 

IF BEST = 0 
ZF^ 1 
ELSE 
ZF^O 
FI 

Flags Affected 

ZF flag is set to 1 in case of zero output (most significant bit of the source is set), and to 0 otherwise, CF flag is set 
to 1 if input was zero and cleared otherwise. OF, SF, PF and AF flags are undefined. 
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Intel C/C++ Compiler Intrinsic Equivalent 

LZCNT: unsigned Int32_lzcnt_u32(unsigned Int32 src); 

LZCNT: unsigned Int64_lzcnt_u64(unsigned Int64 src); 

Protected Mode Exceptions 

#GP(0) For an illegal memory operand effective address in the CS, DS, ES, FS or GS segments. 

If the DS, ES, FS, or GS register is used to access memory and it contains a null segment 
selector. 

#SS(0) For an illegal address in the SS segment. 

#PF (fault-code) For a page fault. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the 

current privilege level is 3. 

Real-Address Mode Exceptions 

#GP(0) If any part of the operand lies outside of the effective address space from 0 to OFFFFFI. 

#SS(0) For an illegal address in the SS segment. 

Virtual 8086 Mode Exceptions 

#GP(0) If any part of the operand lies outside of the effective address space from 0 to OFFFFFI. 

#SS(0) For an illegal address in the SS segment. 

#PF (fault-code) For a page fault. 

#AC(0) If alignment checking is enabled and an unaligned memory reference is made while the 

current privilege level is 3. 


Compatibility Mode Exceptions 

Same exceptions as in Protected Mode. 


64-Bit Mode Exceptions 


#GP(0) 

#SS(0) 

#PF (fault-code) 
#AC(0) 


If the memory address is in a non-canonical form. 

If a memory address referencing the SS segment is in a non-canonical form. 

For a page fault. 

If alignment checking is enabled and an unaligned memory reference is made while the 
current privilege level is 3. 
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